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TITLE OF THE INVENTION 



METHOD FOR DETERMINING THE NUCLEOTIDE SEQUENCE 

OF A POLYNUCLEOTIDE 



HELD OF THE INVENTION 

The present invention relates to nucleic acid chemistry, and more 
sj...c.Uy . a method for determining the nucleotide sequence o 

10 Hiibody or employ such a method. 

BACKGROUND OF THE INVENTION 

The dete^inaHon of 0>e „„cleo«de sequence c, a polynucleotide has 
substantia, utility in medicine, fo„„sics, biomedical research, and in the 
detennlnation of patemit,. and idenhty. Several methods for determining .he 
nudeohde sequence of a polynucleodde have been identified. 
I- Nucleic Acid Sequencing 

Initial attempts to detennlne the sequence of a DNA moiecule were 
ex.«„.ons of techniques which had been initially developed to perlrZ 

G-G. OM.. I . Mol , P,ol, 31:379 am)). Such methods involved the specific 
211.34 (1973,,. c, nearest neighbor analysis (Wu, R., y^., UdaLEtoUE^,, 
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(1971)). ,nd (3) a,e -Wanderings Spot" me*od (Sang«. R.EaitJiiuM^^iSd 

11>. mo« commonly used rr,«l,od, of nucleic .dd sequencing „e the 

S M iTr""*' "» '"°'"> « -Sanger 

5 Method (Sanger, KsUl,], Mglcr, Plol 21:441 (197S); Prober I «al 5^ 

m.33S-MC ,1937,, and *e -^enUca. deg„da«on n«hcd.- -a,;:' 

Maxam-GUber, m«hod- (Maxam. AAI., sul, Eroc. a.^ ^ c , ... . . . 

a:560(1977),bo*re.^hereinin„,porajCrefL„r 

In *e dideoxy-medialed or "Sanger" chain terminahon method o/ DNA 
se^encing, .he sequence of a DNA ■noleo.le is obtained through the extension 
of an ohgonucleotide primer which is hybridized to the nucleic add n,ol.cul. 
betng sequenced. ,n brie,, four separate prin,er extension reactions are 
conducted. In each reaction, a DNA polymerase is added along with the four 
nudeotrde triphosphates needed to polymerize DNA. Each of the reacHons is 

c.rr.«doutintheaddiHo„alpresenceofa2 J dideoxyderiva,iveot,hc.A T C 
or G nucleoside triphosphates. Such derivatives differ from conven'tio^i 
nudeottde triphosphates in that they lac. a hydroxyl residue a, the y pl^n 
^0 of deoxynbose. Thus, although they can be incorporated by a DNA pol^e^ 
■nto the newly synthesized primer extension, the absence of the 3 hylZ 
^up causes then, to be incapable o, forming a phosphodiester bondU 
succeed„,g nucleotide triphosphate. Thus, the incorporation of a dideoxy 
d^vattve resulu in the termination of the extension reaction. Since I 
d deoxy derivatives are present in lower concentrations than hei 
c^spondmg. conventional nucleotide triphosphate analogs, the ne, result^ 

wWch is? '° ' " °"«--'-H^es each of 

whtch ,s .ermmated by the particular dideoxy derivative used in the reaction 
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By subjecting the reaction products of each of the extension reactions to 
electrophoresis, it is possible to obtain a series of four "ladders." Since the 
position of each "ning" of the ladder is determined by the size of the moiecule, 
and since such size is determined by the incorporation of the dideoxy 
derivative, the appearance and location of a particular "rung" can be readily 
translated into the sequence of the extended primer. Thus, through an 
electrophoretic analysis, the sequence of the extended primer can -be 
determined. 

One deficiency of the dideoxy-mediated sequencing method is the need 
to optimize the ratio of dideoxy nucleoside triphosphates to conventional 
nucleoside triphosphates in the chain-extension/chain-termination reactions. 
Such adjustments are needed in order to maximize the amount of information 
which can be obtained from each primer. Additionally, the efficiency of 
dideoxy nucleotide incorporation in a particular target molecule is partially 
dependent upon the primary and secondary structtires of the target. 

The dideoxy-mediated method thus requires single-stranded templates, 
specific oligonucleotide primers, and high quality preparations of a DNA 
polymerase (typically the Klenow fragment of DNA polvmerase I). 

Initially, these requirements delayed the wide spread use of the method 
However, with the ready availability of synthetic primers, and the availability 
of bacteriophage M13 and phagemid vectors (Maniatis, T., sLM., MoW„»,r 
CbnmfT , ii T n homon' Manna] ^nri F d iM o n Cold ,qpnn. u.ru^ . n,^, cold 
Spnng Harbor, New York (1989), herein incorporated by reference), the 
dideoxy-mediated chain termination method is now extensively employed. 



\ 
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B. The Maxam-GUbert Method Of DNA Sequencing 

The Maxam-Gilbert method of DNA sequencing is a degradative 
method. In this procedure, a fragment of DNA is labeled at one end and 
partially cleaved in four separate chemical reactions, each of which is specific 
5 for cleaving the DNA molecule at a particular base (G or C) at a particular type 
of base {A/G, C/T, or A>C). As in the above-described dideoxy method, the 
effect of such reactions is to create a set of nested molecules whose lengths are 
determined by the locations of a particular base along the length of the DNA 
molecule being sequenced. The nested reaction products are then resolved by 
10 electrophoresis, and the end-labeled molecules are detected, t>'pically by 
autoradiography when a »^ label is employed. Four single lanes are typically 
required in order to determine the sequence. 

The Maxam-Gilbert method thus uses simple chemical reagents which 
are readily available. Nevertheless, the dideoxy-mediated method has several 
15 advantages over the Maxam-Gilbert method. The Maxam-Gilbert method is 
extremely laborious and requires meticulous experimental technique. In 
contrast, the Sanger method may be employed on larger nucleic acid molecules. 

Significantly, in the Maxam-Gilbert method the sequence is obtained 
from the original DNA molecule, and not from an enzymatic copy. For this 
reason, the method can be used to sequence synthetic oligonucleotides, and to 
analyze DNA modifications such as methylation, etc. It can also be used to 
study both DNA secondary structure and protein-DNA interactions. Indeed, it 
has been readily employed in the identification of the binding sites of DNA 
binding proteins. 

Methods for sequencing DNA using either the dideoxy-mediated 
method or the Maxam-Gilbert method are widely known to those of ordinary 
skill in the art. Such methods are, for example, disclosed in Maniatis, T., et al .. 
Mplggulqr rioninc, s Uboratnrv M.-.n,,.l P,4;trn Cold f^prin^ H;.rhor 



20 



25 
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Press, Cold Spring Harbor, New York (1989), and in Zyskind, J.W., et al. . 
Rprombinant DNA Laboratory Manual. Acadgmic Press. Inc.. New York (1988), 
both herein incorporated by reference. 

Both the dideoxy-mediated method and the Maxam-Gilbert method of 
5 DNA sequencing require the prior isolation of the DNA molecule which is to be 
sequenced. The sequence information is obtained by subjecting the reaction 
products to electrophoretic analysis (typically using polyacrylamide gels). 
Thus, a sample is applied to a lane of a gel, and the various species of nested 
fragments are separated from one another by their migration velocity' through 
the gel. The number of nested fragments which can be separated in a single 
lane is approximately 200-300 regardless of whether the Sanger or the Maxam- 
Gilbert method is used. Those of great skill in the art can separate up to 600 
fragments in a single lane. Thus, in order to sequence large DNA molecules, it 
is necessary to fragment the molecule, and to sequence the fragments in 
separate lanes of the sequencing gel. The sequence of the entire molecule is 
obtained by orienting and ordering the sequence data obtained from each 
fragment. 

Two approaches have been employed by those of skill in this art to 
accomplish this goal. In a random or shotgun sequencing approach, sequence 
data is collected by subcioning fragments of the target DNA molecule. No 
attempt is initially made to determine the linear orientation or order of the 
subclones with respect to the intact target DNA molecule. Instead, the 
accumulated data are stored and ultimately arranged into order by a computer 
(Staden, R., Nucleic AHHc; Rpc; 24:21 7 (1986); Anderson, S. fiLal^, Nnturt> 290 :457 
(1981); Gingeras, T.R.. Pip]. Chfim. 252:13475 (1982); Sanger, F. sLsd^. 1. M0I. 
msL 1^:729 (1982), and Baer, R. sL^L, Nature 21Q:207 (1984)). As will be 
appreciated, such random shotgun approaches often result in the multiple 
sequencing of the same oligonucleotide fragment, and thus are often inefficient 
in terms of time and materials. 
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In contrast, directed approaches have been employed in whidi sequences 
of the target DNA are obtained in a systematic fashion. For example, the target 
DNA molecule may be ordered by restriction mapping using the methods 
described above, and the discrete restriction fragments sequenced. 
Alternatively, the target molecule may be sequenced by sequencing nested sets 
of deletions which begin at one of its ends. The use of such nested fragments 
progressively brings more and more remote regions of the target DNA into 
range for sequencing. LasUy, sequence information obtained from a particular 
target molecule can be used to prepare a primer which can then be used in a 
subsequent sequencing reaction in order to obtain additional sequence 
information. As will be perceived, a directed sequence analysis of a target DNA 
molecule often requires substantial ajzdffld information regarding the sequence. 
Moreover, for large target molecules (of sizes on the order of kilobases) such as 
would be encountered in the sequencing of eukaryotic (and in particular, 
mammalian) chromosomes, directional sequencing is quite arduous. 

IL Microsequencing and GBA^** Genetic Analysis 

In contrast to the "Sanger Method" and the "Maxam-Gilbert method." 
which identify the sequence of all of the nucleotides of a target polynucleotide, 
"microsequencing" methods determine the identity of only a single nucleotide 
at a "predetermined" site. Such methods have particular utility in determining 
the presence and identity of polymorphisms in a target polynucleotide. 

The CBA^ Genetic Bit Analysis method disclosed by Goelet, P. et al. 
(WO 92/15712, herein incorporated by reference) is a particularly useful 
microsequencing method. In GBA^-, the nucleotide sequence information 
surrounding a predetermined site of interrogation is used to design an 
oligonucleotide primer that is complementary to the region immediately 
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adjacent to, but not including, the predetermined site. The target DNA 
template is selected from the biological sample and hybridized to the 
interrogating primer. This primer is extended by a single labeled 
dideoxynudeotide using DNA polymerase in the presence of at least t%vo, and 
most preferably all four chain terminating nucleoside triphosphate precursors. 

Additional, primer-guided, nucleotide incorporation procedures for 
assaying polymorphic sites in DNA have also been described (Komher, J. S. si 
al, NttC l , AridfT, 12:7779-7784 (1989); Sokolov, B. P., NuH. AHHc 
15:3671 (1990); Syvanen, A.-C.. sLal, Gsncimcsfi:684-692 (1990); Kuppuswamy, 
M-N- SLA. rroc, Natl. ACf^d, Sci- fU.S.A.)'88:1143-1147 (1991); Prezanl, T.R. ^ 
al.. Hum. MutPt , 1.159-164 (1992); Ugozzoli, L. fiLal-, GATA 2:107-112 (1992); 
Nyren, P.fiUi-, Anal, BiOfhfm, mi7l-175 (1993); and Wallace, W089/ 10414). 
These methods differ from GeneHc Bit"! Analysis in that they all rely on the 
incorporation of labeled deoxynucleotides to discriminate between bases at a 
polymorphic site. In such a format, since the signal is proportional to the 
number of deoxynucleotides incorporated, polymorphisms that occur in runs of 
the same nucleotide can result in signals that are proportional to the length of 
the run (Syvanen, A.-C, eLal., Amer. 1 H.,m r^p^^ 52:46-59 (1993)). Such a 
range of locus-specific signals could be more complex to interpret, especially for 
heterozygotes, compared to the simple, ternary (2:0. 1:1, or 0:2) class of signals 
produced by the GBA-. n^ethod. In addition, for some loci, incorporation of an 
mcorrect deoxynucleotide can occur even in the presence of the correct 
dideoxynudeotide (Komher. J. S. cLal.. Nurl. AriH. l?»c 17:7779-7784 (1989)). 
Such deoxynudeotide misincorporation events may be due to the Km of the 
DNA polymerase for the mispaired deoxy- substrate being comparable, in some 
sequence contexts, to the relatively poor Km of even a correctly base paired 
dideoxy- substrate (Kornberg. A., eUJ-, In: DNA Replication. Second Edition 
(1992), W. H. Freeman and Company, New York; Tabor. S. sUil-, Prnc. NaM 
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Acsdi Sci, fU.S.Ai) 56:4076-4080 (1989)). This effect would contribute to the 
background noise in the polymorphic site interrogation. 

Mundy, C.R. (U.S. Patent No. 4,656,127) discusses alternative 
microsequencing methods for determining the identity of the nucleotide 
5 present at a particular polymorphic site. Mundy's methods employ a 
specialized exonuclease-resistant nucleotide derivative. A primer 
complementary to the allelic sequence immediately 3*-to the polymorphic site is 
permitted to hybridize to a target molecule obtained from a particular animal or 
human. If the polymorphic site on the target molecule contains a hucleofcide 

10 that is complementary to the particular exonucleotide-resistant nucleotide 
derivative present, then that derivative will be incorporated by a polymerase 
onto the end of the hybridized primer. Such incorporation renders the primer 
resistant to exonudease, and thereby pennits its detection. Since the identity of 
the exonucleotide-resistant derivative of the sample is known, a finding that the 

15 primer has become resistant to exonucieases reveals that the nucleotide present 
in the polymorphic site of the target molecule was complementary to that of the 
nucleotide derivative used in the reaction. The Mundy method has the 
advantage that it does not require the determination of large amounts of 
extraneous sequence data. It has the disadvantages of destroying the amplified 

20 target sequences, and unmodified primer and of being extremely sensitive to 
the rate of polymerase incorporation of the specific exonuclease-resistant 
nucleotide being used. 

Cohen, D. et n|. (French Patent 2,650,840; PCT Appln. No. WO91/02087) 
discuss a solution-based method for determining the identity of the nucleotide 

25 of a polymorphic site. As in the Mundy method of U.S. Patent No. 4,656,127, a 
primer is employed that is complementary to allelic sequences immediately 3'- 
to a polymorphic site. The method determines the identity of the nucleotide of 
that site using labeled dideoxynucleotide derivatives, which, if complementary 
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to the nucJeotide of the polymorphic site will become incorporated onto the 
termimis of the primer. 

In contrast to the method of Cohen £Lal. (French Patent 2,650,840; PCT 
Appln. No- WO91/02087) the GBA^ method of Goelet, P. et al . can be 
5 conducted as a heterogeneous phase assay, in which the primer or the target 
molecule is immobilized to a solid phase. It is thus easier to perform, and more 
accurate than the method discussed by Cohen. The method of Cohen has the 
significant disadvantage of being a solution-based extension method that uses 
labeled dideoxynudeoside triphosphates. In the Cohen method^ the target 

10 DNA template is usually prepared by a DNA amplification reaction, such as the 
PCR, that uses a high concentration of deoxynucleoside triphosphates, the 
natural substrates of DNA polymerases. These monomers will compete in the 
subsequent extension reaction with the dideoxynudeoside triphosphates. 
Therefore, following the PCR, an additional purification step is required to 

15 separate the DNA template from the unincorporated dNTPs. Because it is a 
solution-based method, the unincorporated dNTPs are difficult to remove and 
the method is not suited for high volume testing. 

IIL Sequencing Via Hybridization To Ordered Oligonucleotide Arrays 

In response to the difficulties encountered in employing gel 
20 electrophoresis to analyze sequences, ahemative methods have been developed. 
Macevicz (U.S. Patent 5,002,867), for example, describes a method for 
determining nucleic acid sequence via hybridization with multiple mixtures of 
oligonucleotide probes. In accordance with such method, the sequence of a 
target polynucleotide is determined by permitting the target to sequentially 
25 hybridize with sets of probes having an invariant nucleotide at one position, 
and a variant nucleotides at other positions. The Macevicz method determines 
the nucleotide sequence of the target by hybridizing the target with a set of 
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probes, and then determining the number of sites that at least one member of 
the set is capable of hybridizing to the target (i.e. the number of "matches**). 
This procedure is repeated until each member of a sets of probes has been 
tested. 

5 rv. Limitation Of Conventional Methods 

Several factors may limit the use of conventional methods in the analysis 
of the nucleotide sequence of a target molecule. Typically, each lane of a 
sequencing gel can resolve only about 300 different fragments. Thus, in order 
to determine the nucleotide sequence of a large DNA molecule, multiple 
sequencing gels are often needed. This, in turn, limits the amount of new 
sequence information which can be readily obtained per day. For a large 
nucleic acid molecule, a substantial number of technically demanding and time 
consuming steps must be performed. In particular, since the above-described 
techniques are capable of analyzing only one set of nested oligonucleotides per 
15 sample, the sequencing of large DNA molecules requires the use of multiple 
sequencing gels each having a large number of lanes. The electrophoretic 
analysis step in the sequencing process thus comprises a significant limitation 
to the amount of sequence information which can be obtained and the rate with 
which it can be processed. 

In summary, a method which would permit accurate nucleotide 
sequencing without gel analysis would be highly desirable. Indeed, for the 
analysis of very large genomes, such as the human genome, the development of 
such methods mav be essential. 
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SUMMARY OF THE INVENTION 

The invention provies a solid phase sequencing method for determining 
the sequence of nucelic acid molecules (either DNA or RNA). In detail, the 
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invention provides a method for detennining the nucleotide sequence of a 
nucleic acid molecuJe which comprises the steps of: 

(A) . arraying a set of nested primer oligonucleotides onto a solid 
support, each array position containing a different array member having a 

5 predetermined sequence; 

(B) indibating oligonucleotides of the array in the presence of a 
preparation of the nucleic add molecules, a polymerase and at least one chain 
terminator nudeoHde; wherein the incubation is under conditions sufficient to 
permit DNA hybridization to occur between the oligonucleotides of the 

10 incubation and the nudeic add molecules; wherein the incubation is condurted 
in the substantial absence of any non-chain terminator nucleotides; 

(C) (1) in the case wherein the 3' terminal nucleotide of an 
oligonudeotide is hybridized to the nudeic add molecule, permitting 
oligonudeotides hybridized to nudeic acid molecules to be extended by 

15 polymerase-mediated incorporation of a single chain terminator nucleotide 
residue onto the 3' terminus of the hybridized oligonucleotide, \vherein for each 
hybridized oligonucleotide being so extended, the incorporated nudeotide 
residue is complementary to the nudeotide residue immediately 5' to the 
nucleotide residue of the nudeic add molecule that is hybridized with that 

20 oligonucleotide's 3' terminal nudeotide residue; then performing step (D); 

(2) in the case wherein the 3' terminal nucleotide of an 
oligonudeotide is not hybridized to the nudeic acid molecule, either. 

(a) not permitting oligonudeotides hvbridized to 
nudeic add molecules to be extended by polymerase-mediated incorporation of 

25 a single chain terminator nudeotide residue onto the 3* terminus of the 
hybridized oligonudeotide, or 

(b) permitting the removal of any non-hybridized 
nucleotide residues from the 3' terminus of the hybridized oligonudeotides, so 
as to form a truncated primer oligonucleotide whose 3' terminus is hybridized 
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to the nucleic acid molecule, and then permitting polymerase-mediated 
incorporation of a single chain terminator nucleotide residue onto the 3' 
terminus of the hybridized truncated oligonucleotide, wherein for each 
hybridized truncated oUgonudeotide being so extended, the incorporated 
nucleotide residue is complementary to the nucleotide residue immediately 5' 
to the nucleotide residue of the nucleic acid molecule that is hybridized with 
that truncated oligonucleotide's 3' terminal nucleotide residue; then performing 
step (D); 

(D) determining, at each array position at which an oligonucleotide has 
incorporated a single chain terminator nucleotide residue, the identity of the 
incorporated chain terminator nucleotide residue; and 

(E) determining the nucleotide sequence of the nucleic acid molecule 
from the determined identity of the incorporated nucleotide of primer 
oUgonucleotides of the array, and known sequence of the oligonucleotide at 
each array position. 

The invention particularly concerns embodiments in each array position 
contains a primer oligonucleotide that is capable of hybridizing to a region of 
the nucleic acid molecule, and/or wherein in step (C), at least some array 
positions contain nucleic acid molecules hybridized to oligonucleotides the 
whose 3' terminal nucleotide is not hybridized to the nucleic acid molecule, and 
wherein step (C)(1) is conducted for such oUgonucleotides 

Either a Thermosequenase class polymerase or a Klenow class 
polymerase may be employed in the method. 

The invention particularly includes the embodiments in which the array 
is a random oligonucleotide array, and in which the array is a nested 
oligonucleotide array (especially one containing oligonucleotide members 
having all possible permutations of nucleotides over a region of from 1 to 20 
bases. 
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The invention is particularly adaped for conducting the method in the 
presence of at least four chain terminator nucleotide species, at least one of 
which is labeled, and more preferably wherein all of the chain terminator 
nucleotide species are labeled, and wherein the label of any such species can be 
5 distinguished from the label of any other species present. 

The invention particularly provides a method of sequence determination 
for genomic DNA of a human or non-human mammal, and is especially 
adapted for use in determining the sequence of DNA suspected to contain a 
genetic variation associated with a disease (e.g., cancer or cystic fibrosis), and in 
10 which the method is employed to determine whether the DNA contains the 
variation. 

In a preferred embodiment of the method, the oligonucleotides are 
immobilized onto the solid support, such as plastic or glass). 

The invention also provides a kit for determining the sequence of a 
15 nucleic acid molecule which comprises a solid support containing an array of 
spaced apart receptacles for oligonucleotides, each receptacle containing a 
different primer oligonucleotide. The kit may additionally contain at least four 
chain terminator nucleotide species, at least one of which is labeled. A highly 
preferred kit contains at least four chain terminator nucleotide species, wherein 
20 all of the chain terminator nucleotide species are labeled, and wherein the label 
of any such species can be distinguished from the label of any other species 
present. 

The kit is particularly suited for determining the nucleotide sequence of 
DNA suspected to contain a genetic variation associated with a disease, and to 
25 provide a determination of nucleotide sequence sufficient to determine whether 
the DNA contains the variation. 
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BRIEF DESCRIPTION OF THE HGURES 

Figure 1 shows the result of a nested GBA™ (N-GBA^) experiment 
Figure 2 shows the four major p53 mutational hot-spot regions 
containing most cited p53 mutations are indicated by the black bars marked A- 
5 D: A = codons 132-143, B = codons 174-179, C = codons 236-258 and D = codons 
272-282 (del = deletion; ins = insertion). 

DESOUPnON OF THE PREFERRED EMBODIMENTS 

nte present invention provides a method of sequencing which provides 
the adv^tages of nticro- and nano-«,uendng and has the ability ,o sequence 

10 polynucleotide regions, m brief, the method einploys ordered arrays of linear 
primers that are capable of hybridizing to a target n,ol«:ul. and repor«ng the 
.denhty of the single nucleotide that is present in the hybridized molecule 
immediately S" to the 3' terminus of the primer. By employing a suitable array 
of such primers, the invention pennits one to ascertain he complete nucleotide 

IS sequence of . targe, polynucleotide. n,era are thus two central aspects to the 
present invenHon: the method of sequence analysis, and the nature of the 
primer array. 

I. GBA™ Sequence Analysis 

The most preferred method of the present invention employs a 
modif.ca.ion of the GBA~ method of analyzing a predetermined site as the 
means for accomplishing sequence analysis. The GBA™ method can be 
conducted in a variety of ways. 1„ parHcular, such interrogation can be 
accomphshed via a polymerase-mediated analysis or by a ligase-mediah^d 
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A. Polymerase-Mediated Analysis 

The polyTnerase-mediated analysis is more fully described by Goelet, P. 
£Lai. (PCT Application W092/ 15712, herein incorporated by reference). In this 
assay, a purified oligonucleotide having a defined sequence (complementary to 
an immediate proximal or distal sequence of a polymorphism) is bound to a 
solid support, especially a microtiter dish. A sample, suspected to contain the 
target molecule, or an amplification product thereof, is placed in contact with 
the support, and any target molecules present are permitted to hybridize to the 
bound oligonucleotide. 

In one preferred embodiment, an oligonucleoHde having a sequence thai 
is complementar}' to an immediately distal sequence of a polymorphism is 
prepared using the above-described methods (and preferably that of Nikiforov, 
T. (U.S. Patent Application Serial No. 08/005,061, herein incorporated by 
reference). The terminus of the oligonucleotide is attached to the solid support, 
15 as described, for example by Goelet, P. gLaJ. (PCT Application WO 92/15712), 
such that the 3-end of the oligonucleotide can serve as a substrate for primer 
extension. 

The immobilized primer is then incubated in the presence i>f a DNA 
molecule (preferably a genomic DNA molecule) hnving a single nucleotide 
polymorphism whose immediately .T-distnl sequence is complementary- to that 
of the immobilized primer. Preferably, such incubation occurs in the complete 
absence of any dNTP (i.e. dATP, dCTP, dGTP, or dTTP), but only in the 
presence of one or more chain terminating nucleotide triphosphate derivatives 
(such as a dideoxy derivative), and under conditions sufficient to permit the 
incorporation of such a derivative on to the S'-terminus of the primer. As will 
be appreciated, where the polymorphic site is such that only two or three alleles 
exist (such that only two or three species of dNTPs. respectively, could be 
incorporated into the primer extension product), the presence of unusable 



20 



25 



10 



wo 97/35033 PCTAJS97/03701 

-16- 

nucleotide triphosphate(5) in the reaction is immateriaL In consequence of the 
incubation, and the use of only chain terminating nucleotide derivatives, a 
single dideoxynucieotide is added to the S'-terminus of the primer. The identity 
of that added nucleotide is determined by, and is complementary to, the 
nucleotide of the polymorphic site of the polymorphism. 

In this embodiment, the nucleotide of the polymorphic site is thus 
determined by assaying which of the set of labeled nucleotides has been 
incorporated onto the 3*-terminus of the bound oligonucleotide by a primer- 
dependent polymerase. Most preferably, where multiple dideoxynucieotide 
derivatives are simultaneously employed, different labels will be used to permit 
the differential determination of the identity of the incorporated 
dideoxynucieotide derivative. 

B. Polymerase/Lxgase-Mediated Analysis 

In an alternative embodiment, the identity of the nucleotide of the 
15 polymorphic site is determined using a polymerase/Iigase-mediated process. 
As in the above embodiment, an oligonucleotide primer is employed, that is 
complementary to the immediately 3'-distal invariant sequence of the 
polynucleotide being analyzed. A second oligonucleotide, is tethered to the 
solid phase via its 3-.end. The sequence of this oligonucleotide is 
20 complementary to the 5 -proximal sequence of the predetermined site being 
analyzed, but is incapable of hybridizing to the oligonucleotide primer. 

These oligonucleotides are incubated in the presence of DNA containing 
the single nucleotide polymorphism that is to be analyzed, and nl least one 2\ 
5 -deoxynucleotide triphosphate. The incubation reaction further includes a 
25 DNA polymerase and a DNA ligase. 

The tethered and soluble oligonucleotides are thus capable of 
hybridizing to the same strand of the single nucleotide polymorphism under 
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analysis. The sequence considerations cause the two oligonucleotides to 
hybridize to the proximal and distal sequences of the polynucleotide site that 
flank the .predetermined site; the hybridized oligonucleotides are thus, 
separated by a **gap" of a single nucleotide at the precise position of the 
5 predetermined site. 

The presence of a polymerase and a deoxynucleotide complementary to 
the nucleotide of the gap permits ligation of the primer extended with the 
complementary deoxynucleotide to the immobilized oligonucleotide 
complementary to the distal sequence, a deoxynucleotide triphosphate that is 

10 complementary to the nucleotide of the polymorphic site permits the creation of 
a ligatable substrate. The ligation reaction immobilizes the deoxynucleotide 
and the previously soluble primer oligonucleotide to the solid support. 

The identity of the polymorphic site that was opposite the "gap'* can then 
be determined by any of several means. In a preferred embodiment, the 

13 deoxynucleotide of the reaction is labeled, and its detection thus reveals the 
identity of the complementary nucleotide of the predetermined site. Several 
different deoxynucleotides may (and preferably will) be present, each 
differentially labeled. Alternatively, separate reactions can be conducted, each 
with a different deoxynucleotide. In an alternative sub-embodiment, the 

20 deoxynucleotides are unlabeled and a labeled dideoxynucleotidc is employed, 
and the second, soluble oligonucleotide is labeled. Separate reactions are 
conducted, each using a different unlabeled dideoxynucieotide. The reaction 
that contains the complementary nucleotide permits the ligatable substrate to 
form, and is detected by detecting the immobilization of the previously soluble 

25 oligonucleotide. 
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C Signal-Amplification 

The sensitivity of nucleic acid hybridization detection assays may be 
increased by altering the manner in which detection is reported or signaled to 
the observer. Thus, for example, assay sensitivity can be increased through the 
5 use of detectably labeled reagents. A wide variety of such signal amplification 
methods have been designed for this purpose- Kourilsky et al, {U.S. Patent 
4^81333) describe the use of enzyme labels to increase sensitivity in a detection 
assay. Fluorescent labels (Prober, J. et al. Science 238:336-340 (1987); Albarella 
gf al.. EP 144914), chemical labels (Sheldon III et al.. VS. Patent 4382,789; 

10 Albarella et ah. U.S. Patent 4,563,417), modified bases (Miyoshi et a!.. EP 
119448), etc. have also been used in an effort to improve the efficiency with 
which hybridization can be observed. 

It is preferable to employ fluorescent, and more preferably chromogenic 
(especially enzyme) labels, such that the identity of the incorporated nucleotide 

15 can be determined in an automated, or semi-automated manner using a 
spectrophotometer. 

D. Use of GBA^' Analysis in the Methods of the Present 
Invention 

GBA"^^ was developed as a solid-phase single nucleotide polymorphism 
20 genotyping method based on single-base extension of an interrogation primer 
across a target base of interest. In contrast to gel-based testing, a solid-phase 
array can be manufactured in a standardized way with quality control, thereby 
ensuring that variation in performance of the test is more a factor of input DNA 
quality and less of operator expertise. The present invention extends this 
25 method to a N-GBA^m format, with the complementary interrogation primers 
nested at one (or more) base intervals across the target sequence, and thus 
enables detailed sequence analysis of a complex target DNA sequence. While 
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GBA™ is well suited to single-base interrogations, the N-GBA*^* method of the 
present invention is ideally suited to analysis of intermediate length (10-100 
base) DNA target sequences. Application of the N-GBA^» method in a 
Sequence Confirmation/composition ANalysis (SCANP**) - chip prototype (a 
5 miniaturized array of interrogation primers on a glass slide) is the most 
preferred embodiment of the method, and permits standardized (through 
manufacture of the oligonucleotide interrogation primer arrays), lower cost 
(through miniaturization of the test) and accurate (through use of the GBA''^^* 
biochemistry) scarming for p53 mutations. 

10 The solid-phase format of the present invention also provides 

advantages in processing, since reagents can be added by hand at small scales, 
or by robots on a larger scale, without changes to the test. The size of the arrays 
can be controlled as well, so that the advantages of miniaturization can be 
realized: thus a 30 jil PGR reaction can be hybridized simultaneously to 

15 hundreds or thousands of oligonucleotides in an array only a few millimeters in 
diameter. In this way, processing can be performed at a "macro" scale, using 
standard pipetters, and information extracted at a "micro" or "nano" scale 
using fluorescent imaging. These advantages provide a lower cost test having 
much more reproducible performance. Unlike methods that rely on 

20 hybridization as the method of analysis, the methods of the present invention 
exploit the use of primer extension biochemistry for nucieotide-by-nucleotidc 
analysis and its application to a solid-phase oligonucleotide array format. The 
addition of primer extension to solid-phase analysis adds significant increases 
in test accuracy and differential sensitivity over hybridization-based 

25 approaches while exploiting the advantages of solid-phase-based testing over 
gel-based tests. 

This strateg}' of nesting the GBA^*^' across a region of interest eliminates 
any need to "expect" (i.e., guess in advance) a particular mutation. Nesting 
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10 



eliminates any need to limit analysis to a specific nucleotide. Current CBA^ 
detection technology is a "two-result" system (distinguishing wild-tj^se from 
mutant). An additional innovation of the preferred embodiments of the present 
invention involves the use of a "four-result" system, which, by parallel 
detection of all four possible DNA bases for each site in the sequence, provides 
enhanced accuracy. With this innovation, the change of any nucleotide in the 
target region to any other possible base will be detectable in a base-specific 
fashion, thus any mutation in a proposed target hot-spot will be identifiable, 
including novel mutations. 

In one embodiment, this is accomplished by separating the arrays into 
four identical array spots to which PCR or other amplified product can 
hybridize equally. The GBA™ extension is thus preferably divided into four 
reaction mixes, each containing a different haptenated dideoxynucleotide 
triphosphate (ddNTP). The four spots represent the four possible bases: G, A, T 
15 and C, and incorporation of each possible base can be evaluated for each 
oligonucleotide in the array and from this the sequence composition of the 
target fragment deduced. The SCAN^' -chip format, utilizing N-GBA^m 
biochemistr}', will thus enable: highly accurate mutation detection due to the 
sensitivity of primer extension to hybridization mismatch at the 3' (extended) 
20 end of the interrogation primer; increased infomiativeness since the mutation is 
detected in a highly localized fashion; relatively standardizablo and simple 
tesHng due to the SCAN^m format; and cost-effectiveness due tt> miniaturization 
of the arrays. 

In accordance with the methods of the present invention, the target 
25 polynucleotide (i.e.. the nucleic acid molecule that i.s to be sequenced) is 
provided to each array position of a spacially separated array of oligonucleotide 
primers in single-stranded form, under conditions sufficient to permit 
hybridization to occur. As used herein, an array of oligonucleotides is said to 
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be "spacially separated" if an oligonucleotide of one sequence is separated from 
an oligonucleotide of another sequence. In the microminiaturized method 
described below, each oligonucleotide species of the array is provided to a 
separate microtiter well. In contrast, in the nanominiaturiuzed method, each 
5 oligonucleotide species of the array is provided to a distinct region of a surface, 
such as a glass slide, etc. As used herein, the term "array" is intended to define 
a two dimensional or three dimensional matrix having a definition of X,Y or 
XXZ, such that, for example, at array position 1,1 a particular oligonucleotide 
is found; an oligonucleotide of different sequence is found at array position 1,2 
10 or 2,1, etc. For each array, the oligonucleotide found at each array position is 
defined and known in advance of any reaction. 

The sequence of each oligonucleotide of each array position is selected 
such that it will be shorter in length than the target polynucleotide being 
sequenced. Most preferably, such nucleotides will be less that 30 bases in 
15 length, and most preferably less than 10 bases. Oligonucleotides of 5 bases in 
length are preferred. As such, if an oligonucleotide of N residues hybridizes to 
the target polynucleotide, its 3' terminus (residue N) will hybridize to a 
nucleotide of the target polynucleotide, and can be extended vin a template- 
dependent polymerization reaction to incorporate an "interrogation nucleotide" 
20 as residue N+1 of that oligonucleotide. The identity of the "interrogation 
nucleotide" is dependent upon (and is complementary to) the nucleotide 
species of the target polynucleotide that is present immediately 5' adjacent to 
the nucleotide that hybridizes to the 3' terminus of the oligonucleotide, prior to 
the polymerization reaction. 
25 Each array position additionally contains more than one different 

nucleotide specie, such that nucleotide species are present that are 
complementary to at least two, and in the most preferred embodiment, all four 
of the nucleotide species of DNA (i.e., adenosine, cytosine, thymidine and 
guanosine, designated A, C, T and G, respectively). The nucleotide species 
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present are "chain terminator" nucleotides. Although such nucleotide species 
can be incorporated onto the 3' terminus of an oligonucleotide by a DNA 
polymerase, the resultant extended oligonucleotide cannot be further extended 
by a polymerase, even in the presence of non-terminator nucleotides. The most 
5 preferred chain terminator nucleotide species of the present invention are 2'- 
deoxynucleoside 5'-triphosphates. The chain terminator nucleotide species are 
detectably labeled, such that an extension reaction that results in the 
incorporation of a nucleotide complementary to one of the nucleotide species of 
DNA can be distinguished from an extension reaction that results in the 

10 incorporation of a nucleotide complementary to a different nucleotide species of 
DNA. Any of the conventionally used radioisotopic, enzymatic, fluorescent or 
chemiluminescent labels may be used in accordance with the methods of the 
present invention. In lieu of such labels, haptenic labels, such as biotin or other 
labels such as ligands, antigens, etc. may be used. Suitable labels arc disclosed, 

15 for example, by Kourilsky et al, (U.S. Patent 4,581,333), Prober et al. (Science 
238:336-340 (1987)); Albarella et aL. (EP 144914), Sheldon III et ah (U.S. Patent 
4,582,789), Albarella etal. (U.S. Patent 4,563,417), and Miyoshi et al. (EP 119448). 

It is however, preferred to employ the enzyme-mediated fluorescence 
precipitation method (Huang, Z. ct aL,Annl Biochcm 207:32-39 (1992), herein 

20 incorporated by reference). In this method of detection, a fluorogenic signal is 
determined by precipitation at a localized reaction site. This novel detection 
chemistry actually combines the powers of enzymatic amplification, rapid in 
situ product precipitation, high contrast of fluorescence signal over (glass) 
background, and quantitation of fluorescent signaL The method thus provides 

25 greater sensitivity than direct fluorescence detection and is operationally 
compatible with a high density oligonucleotide glass array formal. 

A polymerase, and suitable salts and buffers are also provided to each 
array position. The reaction conditions are maintained such that the 
oligonucleotides stably and specifically hybridize to the target polynucleotide. 
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and so that the S'-terminus of the oligonucleotides are extended by addition of 
single chain terminator nucleotide (i.e., the interrogation nucleotide). As used 
herein, "stable" hybridization refers to a hybridization that has a Tjn greater 
than the temperature under which the interrogation assay is to be run 
5 (generally 20-40**C). The term "specific" hybridization denotes that the length 
and /or sequence complexity of the oligonucleotides involved in the 
hybridization are sufficient to preclude non-desired spurious hybridization (as 
might occur, for example, between sequences that are only partially 
complementary). The hybridization is usually carried out for 15 to 30 minutes 
10 at room temperature in a solution containing 1.5 M NaGl and 10 mM EDTA. 
Other hybridization conditions can alternatively be used. The sequence of the 
immobilized oligonucleotide is selected such that it will hybridize to the 
invariant sequence that flanks the polymorphic site of the polymorphism that is 
to be interrogated. 

15 . If the ligase/polymerase mediated GBA"^^' interrogation method is to be 

employed, the methods of Nikiforov et al. (U.S. Patent .Application Serial No.: 
08/192,631, herein incorporated by reference) are preferably emploved. 

Most preferably, the oligonucleotides present at each army position arc 
immobilized to the solid surface of the array support. Such a support may be a 

20 microtiter dish, test tube array, etched glass surface, etc. 

IL Nature of the Oligonucleotide Array 

The nature of the oligonucleotide array may vary depending upon the 
amoimt of prior sequence information available concerning the target molecule. 
In one embodiment of the invention, the array is "non-random." As used 
25 herein, a "non-random" oligonucleotide array is a set of oligonucleotides whose 
members do not contain all possible permutations of nucleotides. A non- 
random array is preferably employed when determining the nucleotide 
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sequence of a polynucleotide for which some a priori sequence information is 
available. Thus, for example, non-random arrays would be employed in 
sequencing those genes of a patient for which the sequence of "normal" alleles 
had been previously determined. In contrast, a "random" array of 
5 oligonucleotides is a set of oligonucleotides whose members do contain all 
possible permutations of nucleotides. A random array is preferably employed 
when determining the nucleotide sequence of a polynucleotide for which little 
or no a priori sequence information is available. 

Primer design is preferably facilitated through the use of the GBA*"^ 

10 Primer 1.0 program (Molecular Tool, Inc.) Primer stability (measured in 
-kcal/mol) and potential sequence-based sources of noise are evaluated by this 
program. A number of sequence-based features can lead to GBA^^* noise for a 
particular target site. The most common source of noise is template- 
independent noise (TIN) and results from self-priming by the GBA^*^» primer. 

15 To eliminate TIN, GBA'^*'' primers may be modified by a base substitution with 
C3 linker or by shortening the primer at the 5' end without sacrificing 
hybridization stability of the template strand. In the N-GBA^*'' system, a set of 
GBA^^ primers which complement the target sequence and are staggered by 
one base will be designed according to the standard GBA'*'^' primer design 

20 strategies described above. An example of N-GBA™ primer design was shown 
in a model study described in the relevant experience section. 

A. Non*Randoni Nested Arrays 

In circumstances in which the part of the sequence of the target molecule 
(or of a normal or reference sequence) has been previously determined, the 
25 oligonucleotide array can comprise a set of non-random nested 
oligonucleotides. 

In the simplest embodiment, the nested primer array will contain all 
possible divergent sequences over the region whose sequence is to be 
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detennined. The maximum number of primers needed to determine the 
sequence of N nucleotides is given by the equation: 

£4^-1 

1 

5 As such, the maximum number of sequences needed to obtain the 

sequence of even a relatively small region rises rapidly when non-random 
arrays are employed, the method is not preferred when more extensive 
sequencing is desired. For example, a maximum of 349325 primers would be 
needed to obtain 10 nucleotides of sequence information by this method. 
10 Hence, for obtaining such (or even more extensive) sequence information, the 
random array method described below is preferably employed. 

Thus, to sequence four nucleotides in the simplest embodiment, a set of 
(1-^4+16+64=) 85 primers would be needed- This aspect of the invention is 
illustrated in Table 1, which shows the sequences of four arrays of nested non- 
15 random 25-mer oligonucleotides ("oligos"), comprising all possible 
permutations of sequence on the final 3 nucleotides. The extent of nesting 
shown in Table 1 is one nucleotide, however, the array oligonucleotides can be 
nested by more than one nucleotide if desired. By using each oligonucleotide of 
a set as a GBA^* primer in a GBA'^'^^ reaction (either in the presence of three 
20 unlabeled terminator nucleotides and one labeled chain terminator nucleotide 
or in the presence of four differentially labeled chain terminator nucleotides), it 
is possible to determine the nucleotide sequence of the particular nucleic acid 
molecule of a sample that is complementary to the set of primers. 

In some circumstances fewer primers may be employed. For example, if 
25 it were known that only one of two nucleotide candidates were possible at 
position 27 (e.g., either A or C, but not T or G), only (1+2+8=) 11 primers would 
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be needed to sequence the three nucleotide positions of any particular target 
molecule. 



1 Table 1 ] 


n ock2 id NU 


Nucleotide Sequence 
or r^osinons 


Position 
Sequenced 


1 1 
H 




26 


2 


TTGTGCTGACTTACCAGATGGGACA 


27 


3 


TTG TCCTGACTTACCAGATGGGACP 


4 


TTGTGCTGACTTACCAGATGGGAPT 


5 




6 


- TGTGCTGACTTACCAGATGGGACAA 


28 


7 


TG TGP TTTACTT AP r An A Tf^nn A P A P 


8 


TGTGCTGACTTACCAGATGGGACAT 


9 


TGTGCTGACTTACCAGATGGGACAG 


10 


TGTGCTGACTTACCAGATGGGACCA 


11 


TGTGCTGACTTACCAGATGGGACCC 


12 


TGTGCTGACTTACCAGATGGGACCT 


13 


TGTGCTGACTTACCAGATGGGACCC 


14 


TGTGCTGACTTACCAGATGGGACTA 


15 


TGTGCTGACTTACCAGATGGGACTC 


16 


TGTGC TGACTTACCAGATGGGACTT 


17 


TGTGCTGACTTACCAGATGGGACTG 


18 


TGTGCTGACTTACCAGATGGGACGA 


19 


TGTGCTGACTTACCAGATGGGACCC 


20 


TGTGCTGACTTACCAGATGGGACGT 


21 


TGTGCTGACTTACCAGATGGGACGG 


22 


GTGCTGACTTACCAGATGGGACAAA 


28 



However, and as discussed above, the GBA^*^ reaction exploits the ability 
of the 3' terminus of the GBA"^* primer to hybridize to the target molecule being 
interrogated. This characteristic of the present invention permits sequence 
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detenninations with far fewer primers, depending upon the class of polymerase 
being employed in the GBA™ reaction. In general, there are two classes of 
pdl)rmerases. One class, typified by the Klenow fragment of E. coli DNA 
polymerase I {Klenow class) possess 3' to 5' exonuclease activity, and are able to 

5 correct 3' base mismatches in the extended primer. The second class, typified 
by the thermostable polymerase, Thermosequenase (USB), (Thermosequenase 
class) do not possess 3' to 5' exonuclease activity, and are thus unable to correct 
3' base mismatches in the extended primer. Polymerases of either class can be 
employed in accordance with the present invention. The characteristics of 

10 polymerases are shown in Table 2. 



Tal 


k>le 2 "1 


Enzyme 


3' to 5' 
Exonuclease 
Activitv 


Ability to 
Correct 
Mismatch 


Possible Outcome | 
(SignahNoise) 


Klenow Fragment 


Strong 


Strong 


Low 


Exo(-) Klenow 


None 


Moderate 


Moderate 


Sequenase 


None 


Moderate 


Moderate 


AmpliTaq 


None 


Weak 


HiRh 


Est Polvmerase 


None 


Weak 


HiRh 


Thermosequenase 


None 


Weak 


HiRh 



Since Thermosequenase class polymerases do not possess 3' to 5' 
exonuclease activity, unless a priori sequence information is available, it is 

15 preferable to employ each oligonucleotide in a nested set of all possible 
permutations. Nevertheless, in many circumstances incomplete sets of 
oligonucleotides may be employed in concert with Thermosequenase class 
polymerases. For example, if SEQ ID NO:l, SEQ ID NO:2, SEQ ID NOiS and 
SEQ ID NO:22 were employed to sequence a target having a sequence other 

20 than GTTT at positions 25-28, one or more of the oligonucleotides would fail to 
hybridize its 3' terminus to the target, and minimal nucleotide incorporation 
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would result. Hence a result indicating that GBA"^ reactions on a particular 
target molecule led to the incorporation of a label for SEQ ID NO:l and SEQ ID 
NO:2, but not for SEQ ID NO:6 or SEQ ID NO:22 would indicate that the 
nucleotides at positior^ 27-28 were not Ts. In one embodiment, such an 
5 observation of impaired incorporation is a useful indication that the sequence of 
the target molecule differs from that of the reference allele. As such, this 
embodiment is useful in identity and paternity analysis, and in genetic 
screening. 

In contrast, since Klenow class pol5nnerases can correct mismatches as 

10 well as extend primers, when such polymerases are employed in the GBA'^ 
reaction, incorporation of label may reflect primer repair as well as primer 
extension. Thus, the use of Klenow class polymerases in the present invention 
has a salient advantage. Instead of needing to provide all permutations of the 
sequence to be determined, one need provide only one oligonucleotide for eadi 

15 position to be determined. Thus, to determine the sequence of positions 26-29 
in the example shown above, one would need to provide at most 4 
oligonucleotides (i.e. an oligonucleotide, such as SEQ ID NO:l ending at 
position 26, an oligonucleotide, such as SEQ ID ND:2 ending at position 27, an 
oligonucleotide, such as SEQ ID NO:6 ending at position 2S, and an 

20 oligonucleotide, such as SEQ ID NO:22 ending at position 29. 

Thus, when Klenow class polymerases are employed, two possibilities 
exist with respect to such an array: a particular nucleotide may become labeled 
by extension, or it may become labeled by primer mismatch repair. In general, 
only a single unambiguous sequence will be obtained. For example. Table 3 

25 gives the results that would be obtained from the use of SEQ ID NO:l, SEQ ID 
NO:2, SEQ ID NO:6 and SEQ ID NO:22 to evaluate a particular target molecule 
having the sequence CATGCG at nucleotide positions 25-30. 
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Tables 






SEQID 
NO. 

23 


Nucleotide Sequence 
of Positions 1-25 

CCGTACTCCCATCTGGTAAGTCAGCACAAG 


Position No. 
of Nucleotide 
Sequenced by 
Array 


Nucleotide | 
Reported | 


1 


CTTGTGCTGACTTACCAGATGGGAC 


25 


G 1 


2 


TTGTGCTGACTTACCAGATGGGACA 


25 


G 1 


6 


TGTGCTGACTTACCAGATGGGACAA 


28 


c 1 


22 


GTGCTGACTTACCAGATGGGACAAAA 


28 


c 1 



In the case of SEQ ID NO:l, the incorporation of G reflects the ren^oval of 
the 3' terminal C residue, and the incorporation of a G (as the nucleotide 
complementary to the C at position 25 in the target). In the case of SEQ ID 
5 NO:2, the incorporation of G reflects the removal of the 3' terminal A and C 
residues, and the incorporation of a G (as the nucleotide complementary* to the 
C at position 25 in the target). In the case of SEQ ID NO:6, the incorporation of 
T reflects the hybridization of the 3' terminus of the primer to the target, and 
the extension of the primer by one nucleotide (C, the nucleotide complementary 
10 to the C at position 25 in the target. In the case of SEQ ID NO:22, the 
incorporation of C reflects the removal of the 3' terminal A and A residues, and 
the incorporation of a C (as the nucleotide complementary to the G at position 
28 in the target). 

Such sequence assignments flow from the known rules of base pairing. 

15 In the above-example, the incorporation of G, C, C and C could not mean that 
positions 26-29 of the target were CCGG, because such a nucleotide sequence is 
incompatible with the (known) sequence of the 3' terminus of SEQ ID NO:22. 
In a similar manner, consideration of the known sequences of the 
oligonucleotides reveals the nucleotide position being reported by a particular 

20 nucleotide of the array. In a preferred embodiment, such consideration is 
facilitated by performing separate sequence determinations with both a 



wo 97/35033 



PCr/US97/0370l 



-30- 

nucleotide array and its complement (such that the sequences both strands of a 
target molecule are obtained). 

Table 4. illustrates typical oligonucleotide arrays by displaying sets of 
oligonucleotides sufficient to permit sequence analysis of exon 23 of the human 
5 BRCAl gene (E23) and cystic fibrosis (CF) (at the locus of nucleotide 549) genes 
with a Klenow class polymerase. In the Table, the sign {+/•) indicates the 
strand of the target being sequenced; the number (1-6) indicates the position of 
the target being interrogated. 
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Table 4 


SEQ ID NO 


Oligo 


Nucleotide Sequence 


24 


E23+J 


CTTGTGCTGACTTACCAGATLjUoAC 


25 


E23+2 


TTGTGCTGACTTACCAGATGGGACA 


26 


E23+3 


TGTGCTGACTTACCAGATGGGACAC 


27 


E23+4 


GTGCTGACTTACCAGATGGGACACT 


28 


E23+5 


TGCTGACTTACCAGATGGGACACTC 


29 


E23+6 


GCTGACTTACCAGATGGGACACTCT 


30 


E23-1 


GTCATTAATGCTATGCAGAAAATCT 


31 


E23-2 


TCATTAATGCTATGCAGAAAATCTT 


32 


E23-3 


CATTAATGCTATGCAGAAAATCTTA 


33 


£23-4 


ATTAATGCTATGCAGAAAATCTTAG 


34 


E23-5 


TTAATGCTATGCAGAAAATCTTAGA 


35 


E23-6 


TAATGCTATGCAGAAAATCTTAGAG 


36 


CF549+1 


AAAGAAATTCTTGCTCGTTGACCTC 


37 


CF549+2 


AAGAAATTCTTGCTCGTTGACCTCC 


38 


CF549-h3 


AGAAATTCTTGCTCGTTGACCTCCA 


39 


CF549+4 


GAAATTCTTGCTCGTTGACCTCCAC 


40 


CF549+5 


AAATTCTTGCTCGTTGACCTCCACT 


41 


CF549-1 


TTCTTGGAGAAGGTGGAATCACACT 


42 


CF549-2 


TCTTGGAGAAGGTGGAATCACACTG 


43 


CF549-3 


CTTGGAGAAGGTGGA.2^TCACACTGA 


44 


CF549-4 


TTGGAGAAGGTGGA.2^TCACACTGAG 


45 


CF549-5 


TGGAGAAGGTGGAATCACACTGAGT 



As will be recognized, the use of a Klenow class polymernse permits 
sequence determinations using far fewer than the maximum number of 
oligonucleotides that would otherwise be required. Nevertheless, because 
5 repair of mismatches may complicate analysis, Thcrmoscqucnase class 
polymerases are the preferred polymerases of the present invention. Since such 
polymerases do not repair mismatches, they arc preferably used in 
embodiments in which oligonucleotides having all possible permutations of 3' 
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sequence are provided, or more preferably, in embodiments in which two 
oligonucleotide arrays are employed (one complementar>' to one strand, and 
the other complementary to the second strand). 

B. Random Nested Arrays 

5 Whereas the non-random nested array method described above is 

predicated on providing the target molecule with hybridization 
oligonucleotides that possess the exact sequence of the target, the random 
nested array method is predicated on deriving sequence information from the 
pattern of oligonucleotides of the array that are extended in the GB A"^^' reaction 
10 as well as from the identity of the nucleotide added to each extended 
oligonucleotide. 

In the method, an array of oligonucleotide primers is employed. The 
lengths of the primers are most preferably uniform, and can var\' from 6-20 
nucleotides in length. For an array of N nucleotides, there are 4^ possible 

15 sequence permutations. However, because each oligonucleotide can (if 
hybridized to target in a GBA'*'^' reaction) be extended by one nucleotide, the 
use of an array of random primers of N nucleotides in length can generate 
sequence information for 4^+^ nucleotides. Hence, an array of 4,096 
oligonucleotides (comprising a random permutation of all possible 6-mers) 

20 could simultaneously sequence 16,384 bases of a target molecule. 

The random array method may be illustrated as follows. An array of all 
possible 6-mers is prepared such that the x,y array location and sequence of 
each oligonucleotide of the array is known. Each array position is incubated 
with the same target polynucleotide, and a CBA*^' reaction is conducted for 

25 each array position. These parallel (or sequential) reactions lead lo the 
formation of a sequence signature consisting of array positions whose 
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oligonucleotides have not been extended, and those whose oligonucleotides 
have been extended by addition of A, C or G. 

One array position for an extended oligonucleotide is selected at random 
(although, in a preferred automated mode, multiple positions may be processed 
in parallel)- The sequence of the extended ohgonucleotide at the selected array 
position is determined using the oligonucleotide's initial (predefined 6-mer 
sequence) and the identity of the labeled nucleotide added to the 
oligonucleotide's 3' terminus in the GBA^ reaction. This determination defines 
a second 6-mer oligonucleotide (consisting of nucleotides 2-7 of the selected 
oligonucleotide). The array location of this second 6-mer position is identified, 
and the extension product formed by the oligonucleotide at that array position 
is determined. Such sequence information defines a third 6-mer 
oligonucleotide (consisting of nucleotides 2-7 of the second selected 
oligonucleotide). In like marmer, the entire sequence stored in the army can be 
deduced. 

A salient feature of the use of the GBA^' reaction in accordance with the 
methods of the present invention is the capacity to miniaturize such methods, 
resulting in a savings of space, reagents, and time, and providing increased 
throughput and reliabilit\\ 

DL Microminiaturized Analysis Method 

In one embodiment, a microminiaturized analysis format is employed. 
As used herein^ a microminiaturized reaction is one conducted in a reaction 
volume of greater than 50 but less than 200 p.l, and most preferably less than 
100 ^il. Such analysis is most preferably conducted in 96 well microtitcr well 
plates, using the indirect fluorescent colorimetry method of Huang, Z. ct aL 
{Anal Biochem 207:32-39 (1992)), and the use of liquid handling robots to deliver 
reagents. 
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A preferred format involves designing the GBA*^ primers so that they 
are associated with biotinylated spacer arms sufficient to permit them to 
become bound to a glass or plastic support (such as a glass slide, etc.). This 
attachment approach has the advantage of high specificity and results in 
5 minimal nonspecific backgrounds during attachment and hybridization. A 
preferred glass slide support for oligonucleotide immobilization has v^ells of 
exposed glass surrounded by a hydrophobic Teflon coating (Cel-line 
Associates, Inc.). The plates have 12 wells (7 mm in diameter), and are 
designed such that solutions can be dispensed with standard, multichannel 
10 pipetting instruments, and signals can be read on existing plate readers. Avidin 
will be covalently attached onto a glass-slide using our proprietary attachment 
chemistry. A 50 |xl solution of 0.4 jim biotinylated oligonucleotide will then be 
added to each well, and incubated for 2 hrs, then rinsed with TNTw (lOmM 
Tris-HCl, pH7.5, 150 mM NaCl, 0.05% Tween-20). 

15 IV- Nanominiaturized Analysis Method 

In an alternate embodiment, a nanominiaturized analysis format is 
employed. As used herein, a microminiaturized reaction is one conducted in a 
reaction volume of less than 50 fil, and most preferably less than 10 |xi. 

In a preferred nanominiaturized embodiment, the support will be an 

20 etched glass plates that will hold several hundred to several thousand 
nanowells (0.1 - 5 jxl volume per well), such that entire arrays can be evaluated 
simultaneously. The determination of the result of the GBA""^ reaction will 
most preferably be performed via a automated processing using, for example a 
pixel by pixel CCD camera equipped to distinguish the labels of the nucleotides 

25 being employed. Detection of the extension may be accomplished using a 
variety of labels, however, two detection schemes are preferred i) direct 



wo 97/35033 



PCT/US97/037ai 



•35- 

fluorescence detection on glass, and ii) enzyme-mediated fluorescence 
detection. 

Having now generally described the invention, the same will be more 
readily imderstood through reference to the following examples which are 
5 provided by way of illxistration, and are not intended to be limiting of the 
present invention, unless specified. 

EXAMPLES 

Example 1 
Nested GBA'^^' Analysis 

10 In order to demonstrate the biochemical feasibility of ndaptinp GBA^^' 

technology to determine all 4 bases at each nucleotide position within a string 
of sequence, the following N-GBA™ experiment was conducted. A target 
polynucleotide having the sequence: 

SEQIDNO:46 

15 (Wild-type) 5' CCAGAAGAAA GGGCCTTCAC AGTGTCCTTT 

ATGTAAG AAT G ATATA ACC-3' 

or 

SEQIDNO:47 

(Mutant) 5' CCAGAAGAAA GGGCCTTCAC AGGGTCCTTT 

20 ATGTAAG AAT GATATAACC-3' 

was interrogated with a set of primers that had been immobilized on to the • 

surface of a 96 well microtiier plate in order to type the central five bases 

(shown in boldface) of the "wild-type" sequence (AGTGT) and of a single-base 

"mutant" sequence (AGQGT). The primers used had the following sequence: 

25 SEQIDNO:48 (Primer 1) 5' GGTT.-.TATCATTCTTACATAA^GG 3' 

SEQ ID NO:49 (Primer 2) . 5' GTTATATCATTCTTACAT/'AAGG;-. 3' 

SEQ ID NO:50 (Primer 3) 5' TTATATCA.TTCTTACATAAAGGAC 3' 

SEQIDNOiSl (Primer 4) 5' TATATCATTCTTACATAAAGGACA 3' 

SEQIDNO:52 (Primer 5) 5' ATATCATTCTTACATA-AAGGACAC .3' 
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Two commercially available DNA polymerases, the Klenow fragment of 
£. coll DNA polymerase I and the thermostable Thermosequenase (USB), were 
used for the single-base extension reaction. Primers were immobilized onto 
polystyrene plate via cationic detergent (Octyldimethylamine) promoted 
5 passive adsorption (Nikiforov, T.T. etaL,A7mI Biochem 227:201-209 (1995)) at 
defined locations. The wild-type and mutant templates were hybridized to the 
immobilized GBA™ primers, and the 3' end of the GBA^M primers were 
extended by a single fluorescent labeled chain terminator ddNTP by either 
Klenow or Thermosequenase. Enzyme-mediated fluorescence signal were 

10 obtained iising the Cytofluor IT fluorescent plate reader. The results of the 
experiment are shown in Table 5. 

As shown in Table 5, the final colorimetric readouts from the extensions 
of Klenow fragment and Thermosequenase with the matching primer set and 
wild-type template were consistent with the true base sequence. When the 

15 mutant template was present, however, the two DNA polymerases gnve quite 
different readout patterns. Klenow, known for its 3' to 5' exonuclease activity, 
was able to correct the 3' base mismatches of Primers 4 and 5 with the mutant 
template and extend only the C base from these primers. On the other hand, 
Thermosequenase could not repair and extend at any of these mismatches, 

20 resulting a lack of signal for both Primers 4 and 5. Either enzyme could 
produce very distinct and differential patterns of colorimetric readout for the 
wild-type and mutant templates, demonstrating the use of this N-GBA"^*^^ 
approach to screen for mutations. 
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Table 5 


1 Template Primer Used 


Polymerase 


IGenow 




Base Extended 


A 


G 


T 


C 


Wild-type 


Primer 1 


2.0 


0.43 


0.15 


0.24 


Primer 2 


0.94 


0.42 


0.16 


2.0 


Primers 


2.0 


030 


0.15 


•0.79 


Primer 4 


030 


0.15 


0.19 


1.9 


Primer 5 


028 


0.14 


1,7 


0.55 


Mutant 


Primer 1 


0.96 


0.45 


0.15 


0.16 


Primer 2 


0.88 


0.43 


0.17 


1.1 


Primers 


035 


0.33 


023 


1.8 


Primer 4 


0.26 


0.16 


0.10 


1.3 


Primer 5 


0.25 


0.15 


0.11 


1.3 




Thermosequenase 




3ase Extended 


A 


G 


T 


C 


Wild-type 


Primer 1 


2J2 


028 


0.11 


0.14 


Primer 2 


0.33 


0.18 


0.15 


2.1 


Primer 3 


2.1 


0.16 


0.12 


0.22 




Primer 4 


0.20 


0.11 


0.15 


2.1 




Primer 5 


0.15 


0.12 


2.2 


0.16 


Mutant 


Primer 1 


1.2 


0.19 


0.11 


0.12 


Primer 2 


0.23 


0.19 


O.IS 


1.3 


Primer 3 


022 


0.14 


0.13 


1.5 


Primer 4 


0.10 


0.10 


0.10 


0.16 




Primer 5 


0.10 


0.10 


0.15 


0.14 



Overall, this data reveals two important points: 1) Thermosequenase 
reduced template dependent noise due to its better S:N ratios when compared 
to Klenow, and 2) Thermosequenase did not extend at a non-specific base (i.e., 
it stopped when the primer overlapped the non-specific base), thus clearly 
indicating a mismatch which can be used to locate the position of the mutation. 
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10 



These advantages suggest that the exonuclease-free Theimosequenase enzvme 
is better suited to the N-GBA^m technology, however, since Thermosequenase 
does not give false data at a mismatch, SCANT** must also be perfonned from 
the other strand to detennine the sequence that follows the mutation. 

Example 2 

Nested GBA™ Analysis On Glass Slides 

The feasibility of conducting nested GBA™ (N<;BA™) reactions on glass 
slides was evaluated. For this purpose, 25-mer GBAtm primers were 5' 
specifically attached onto the surface of a glass slide via avidin-biotin affinity 
interactions. The glass slides had wells of exposed glass surrounded by 
hydrophobic Teflon coating (Cel-line Associates, Inc.). The 12 wells were 7 mm 
in diameter, and were designed such that solutions could be dispeiised with 
standard, multichannel pipetting instruments, and signals could be read on 
existing plate readers. A 50 fxl solution of 0.4 ^m biotinylated oligonucleotide 
15 was added to each weU, incubated for 2 hrs (1.5 mM NaCI, 10 mM EDTA. and 
0.5 iiM target synthetic template strands), and then rinsed with TNTw (lOmM 
Tris-HCl, pH7.5, 150 mM NaCl, 0.05% Tween-20). 

GBATM biochemistry (Nikiforov, T.T. ct al., NucI Acids Res 22:4167-4175 
(1994); Nikiforov, T.T. el al, PGR Methods and Apps 3:285-291 (1994), both herein 
incorporated by reference) was used to analyze the synthetic templates; each 
synthetic template was split into four different wells, and each well was treated 
with extension mix containing all the extension reaction components, 
exonuclease free Klenow fragment of R.coli polymerase, and each of four 
fluorescein-labeled ddNTPs and co-ddNTPs. Enzyme-mediated fluorescence 
25 signal were obtained using the Cytofluor II fluorescent plate reader. Synthetic 
template 1 was designed to give a GBA^m signal in base A, and synthetic 
template 2 was designed to give a GBA^m signal in base G. 



20 
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The CBA^^ extension reactions are detected using the enz\Tne-mediated 
fluorescence precipitation method (Huang, Z. et aL, Anal Biochem 207:32-39 
(1992); Huang, Z. et al, J Histochan Cytochcm 41:313-317 (1993)). The glass slide 
containing the fluorescein GBA"^* signal are incubated for about 30 minutes 
5 with anti-fluorescein alkaline phosphatase solution under a blocking condition 
commonly used in ELISA or histochemical procedures. After washing, a 
droplet of an alkaline phosphatase fluorogenic precipitating substrate solution 
(Molecular Probes) is applied to either individual reaction wells or the entire 
slide. Following a 15 minute incubation and wash, the GBA^»^< signal can be 
10 immediately visualized under a conventional fluorescent microscope equipped 
with a 360 nm excitation filter and a 530 nm emission filter, or quantitated by a 
fluorescence microtiter plate scanner (Cytofluor II) equipped with the same 
filter set. 

The results of this experiment are shown in Figure 1. The results were as 
15 expected: both templates gave strong signals in correct bases with virtually no 
noise in other bases observed (the S:N ratio ranged from 28 to 14.2). This 
experiment demonstrated the feasibility of performing GBA™ biochem istrj' on 
glass, and detection of GBA^m signal using sensitive enzyme-media ted 
fluorescence detection using a commercially available fluorescent plate reader, 
20 the Cytofluor 11. The high quality of the results strongly suggest that the 
proposed N-GBA^m biochemistry should perform very well on glass surface 
when combined with the enzyme-mediated fluorescence detection, and puts us 
on the path towards a low-cost miniaturizable GBA^m processing format. 

Example 3 

25 Nested GBA^'^^ Analysis of the BRCAl Gene 

The feasibility of utilizing the nL-stcd GBA^^ (N-GBA"^') approach to 
accurately identify mutations in cxon 23 of the human BRCAl gene was 
evaluated. 
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Mutations in the human BRCAl gene have been implicated as correlated 
with familial breast cancer. In particular, a mutation located at position 354-359 
of the normal (wild-type) sequence (TAGAGT) has been correlated with 
familial breast cancer. Primers having the sequences SEQ ID Nos: 24-29 and 30- 
35 were used to sequence sample BRCAl genes (Table 6). 







X aDie o 


SEO ID NO 




iNUCieotiae bequence of E23 


53 




TCTTAGAGTGTCCCATCTGGTAAGTCAGCACAAG 


24 


E23+1 


CTTGTGCTGACTTACCAGATGGGAC 


25 


E23+2 


TTGTGCTGACTTACCAGATGGGACA 


26 


E23+3 


TGTGCTGACTTACCAGATGGGACAC 


27 


E23+4 


GTGCTGACTTACCAGATGGGACACT 


28 


E23+5 


TGCTGACTTACCAGATGGGACACTC 


29 


E23+6 


GCTGACTTACCAGATGGGACACTCT 


SEQ ID NO 


Oligo 


Nucleotide Sequence 


54 




GACA.CTCTAAGATTTTCTGCATAGCATTAATGAC 


30 


E23-1 


GTCATTAATGCTATGCAGAAAATCT 


31 


E23-2 


T^:\TTAATGCTATGCAGAAAATCTT 


32 


E23-3 


CATTAATGCTATGCAGAAAATCTTA 


33 


E23-4 


ATTA.:^TGCTATGCAGAAAATCTTAG 


34 


E23-5 


ri^AATGCTATGCAGAAAATCTTAGA 


35 


E23-6 


. T.-J^TGCTATGCAGAAAATCTTAGAG 



Thus, nestedGBATM reactions were performed using Klcnow and exo- 
Klenow polymerase, and fluorocein labeled ddNTPs. The results of this 
experiment are shown in Table 7. 
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Table 7 " ~ 

Nested GBA^^' Reaction Using "V' Template Strand of E23 of BRCAl 



Nucleotide Extended Using Exo-Klenow 



1 Primer 


A 


i 


C 


G 


T 


1 Used 


Signal 


UN 


Signal 


IIN 


Signal 


TDM 


Signal 


nN 


1 E23+1 


3.07 


0.15 


0.93 


0.12 


038 


0.19 


038 


0.12 


E23+2 


0.94 


0.14 


2.97 


0.15 


035 


0.16 


0.45 


0.14 


E23+3 


0.42 


0.11 


0.41 


0.10 


0.26 


0.11 


2^ 


0.10 


E23+4 


0.49 


0.15 


3.21 


0.16 


0.38 


0.34 


0.75 


0.16 


E23+5 


0.37 


0.14 


0.49 


0.14 


031 


0.14 


2.70 


0.13 


E23+6 


3.13 


0.22 


0.40 


0.18 1 0.62 


2.82 


0.62 


0.16 




N 


ucleotide Extended Using Klenow 




Primer 
Used 


A 




C 


G 


T 


Signal 


TIN 


Signal 1 


TIN 


Signal 


TIN 


Signal 


TIN 


E23+1 


334. 


0.38 


138 


0.21 


0.81 


0.21 


0.41 


0.15 


E23+2 


1.55 


035 


2.96 


0.15 


0.58 


0.19 


0.44 


0.17 


E23+3 


1.04 


038 


1.17 


0.13 


0.52 


0.16 


3.08 


0.12 


E23+4 


1.03 


0.57 


335 


0.13 


0.69 


0.45 


1.29 


0J22 


E23+5 


0.57 


0.12 


1.54 


0.15 


0.47 


031 


334 


0.19 


E23+6 


3.36 


0.31 


0.88 


0.17 


1.41 


1.64 


1.12 


0.18 



The results shown in Table 7 thus show that Klenow and Exo-Klenow 
gave the same sequence (ACTCTA) for the primer extension, thereby indicating 
that the strand of the E23 locus being sequenced had the complementary 
sequence (3' TAGAGT 3'). To confirm this result, a nested GBAtm reaction was 
performed using the template strand of E23 of BRCAl. The results of this 
experiment are shown in Table 8. 
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Table 6 ^ 


Nested 


GBA^M Reaction Using "-" Template Strand ot £23 of BRCAl | 




Nucleotide Extended Using Exo-Kienow "l 


Primer 
Used 


A 


C 


G 




Signal 


TDM 


Signal 


TIN 


Signal 


TIN 


1 bignal 1 TIN 


E23-1 


0.54 


0.08 


0.28 


0.11 


0.43 


OJ22 


2.00 


0.12 


E23-2 


2.04 


0.09 


0.19 


0.08 


031 


0.08 


0.68 


0.08 


E23-3 


0.69 


0.11 


0.16 


0.10 


3.29 


0.09 


0.46 


0.08 


E23-4 


3.15 


0.36 


0.18 


0.14 


0.44 


0.10 


0.69 


0.09 


E23.5 

till All 


0.26 


0.14 


0.17 


0.09 


234 


0.10 


1.00 


137 


E23-6 


Q3S 


0.10 


0.24 


o.oy 


0.47 


0.09 


2.57 


0.24 




Nucleotide Extended Using Klenow 


Primer 
Used 


A 


C 


G 


T 


Signal 1 TIN 


Signal 


TIN 


Signal 


TIN 


Signal 


TIN 


E23-1 


0.47 


1.24 


0.78 


0.12 


0.66 


037 


3.43 


0.70 


E23-2 


1.78 


0.12 


0J>6 


0.08 


0.24 


0.1 


0.96 


0.11 


E23-3 


1.48 


0.16 


OJl 


0.O9 


3.42 


0.24 


0.97 


031 


E23-4 


330 


0.22 


0.29 


0.08 


0.80 


0.15 


0.72 


0.30 


E23-5 


033 


0.22 


0.17 


0.09 


2JJ9 


0.11 


2.57 


1.85 


1 £23-6 


0.73 


0.14 


0.25 


0.11 


0.95 


0.17 


3.48 


0.42 



The results shown in Table 8 thus show that Klenow and Exo-Klenow 
gave the same sequence (TAGAGT) for the primer extension, thereby indicating 
that the "-" strand of the E23 locus being sequenced had the complementary 

5 sequence (5' ACTCTA 3'). 

In order to demonstrate the ability of the present invention to discern 
mutations in the £23 locus, two additional experiments were performed. In the 
first experiment, a target strand having a deletion of the AG nucleotides was 
analyzed. In the second, a mixture of the normal and AG deletion target 

0 template was made, and analyzed via the nested GBA™ method. The first 
experiment thus discerns the profile that would be presented by an individual 
having a homozygous mutation in the £23 locus, while the second experiment 
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analyzes the profile that would be presented by an individual having 
heterogygosity in this region. The results of these experiments are shown in 
Tables 9 and 10. 

5 



1 Table 9 | 


Nested GBA™ Reaction Using " AG" Deletion in "+" Template Strand 1 

ofE23ofBRCAl 




Nucleotide Extended 


Primer 


A Signal 


C Signal 


G Signal 


T Signal 


E23+1 


2.77 


0.13 


0.11 


0.1 


E23+2 


022 


2.45 


0.11 


0.11 


E23+3 


0.12 


0.13 


0.1 


1.51 


E23+4 


iS3 


0.1 


0.1 


0.11 


E23+5 


1.53 


0.1 


0.11 


0.12 


E23+6 


0.24 


0.11 


0.61 


0.1.^ 1 



The results shown in Table 9 define an extended sequence for this 
sample of ACTAA, thereby indicating that the strand of the E23 locus being 
sequenced had the complementan' sequence (5' TTAGT 3') (see SEQ ID NO:53). 

10 The observed sequence is explained as follows: Primers 1-3 sequence bases that 
precede the deletion, and hence report the wild-type sequence (ACT), Primer 4, 
which ends just before the deletion, reports the sequence of the first nucleotide 
of the target strand that follows deletion (i.e., A). Primer 5, when hybridized to 
the deletion ends with a one base mismatch, which is removed by the 

15 polymerase. The truncated hybridized primer 5 then sequences the same 
nucleotide as that sequenced by Primer 4. Primer 6, which has a two base 
mismatch is not extended in the reaction. 
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10 



Table 10 i 


" AG" 1 


lested GBATM Reaction Usmg A Mixture Of Templates 
Deletion And Normal "+" Template Strand of E23 of BRCAl 




Nucleotide Extended 


Primer 


A Signal 


C Signal 


G Signal 


T Signal 


E23+1 


331 


039 


034 


0.36 


E23+2 


03 


237 


039 


0.41 


E23+3 


0.29 


035 


033 


131 


E23+4 


1.97 


233 


0.38 


0.4 


E23+5 


051 


038 


0.53 1.64 


E23+6 


3.21 


0.4 


0.41 j 0.43 



Table 10 reveals that Primers 1-3 were extended as expected to yield 
extension products A, C, and T, respectively for both wild-type and AG 
deletion target molecules. The presence of wild-tj'pe target results in the 
extension of Primer 4 with a C residue (consistent with the results obtained 
above; see Table 7). Similarly, the presence of the wild-type target causes 
Primers 5 and 6 to be extended by T and A, respectively (see Table 7). The 
presence of the AG deletion target causes Primer 4 to be extended by an A 
(consistent with the result shown in Table 9). Consistent with the fact that the 
target mixture is 1:1 wild-type:mutant. the signals of A and C addition for 
Primer 4 are approximately equal. Neither Primer 5 nor Primer 6 are extended 
when hybridized to the AG deletion target because their 3' terminal nucleotides 
would not be base-paired with the AG deletion target mutant. The failure of 
Primer 5 to be extended when hybridized to the AG mutant reflects the 
15 relatively lower binding avidity of the polymerase for Primer 5:mutant 
duplexes as compared to Primer 5:wild-type duplexes (in which there would be 
no mismatch). 
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Example 4 
Nested GBA"^ Analysis of the CF Gene 

The feasibility of utilizing the nested GBA™ (N-GBA™) approach to 
accurately identify mutations in the nucleotide 549 locus of the human CF gene 
was also evaluated. 

Thus, a set of primers (shown in Table 11) was prepared, and used in a 
nested GBA™ reaction to sequence a locus of the cystic fibrosis gene (CF) 
around nucleotide 549. 



Table 11 


SEQIDNO 

55 ■ 


Oligo 


Nucleotide Sequence 

CTGAGTGGAGGTCAACGAGCAAGAATTTCTTT 


36 


CF549+1 


AAAGAAATTCTTGCTCGTTGACCTC 


37 


CF549+2 


AAGAAATTCTTGCTCGTTGACCTCC 


38 


CF549+3 


AGAAATTCTTGCTCGTTGACCTCCA 


39 


CF549+4 


GAAATTCTTGCTCGTTGACCTCCAC 


40 


CF549+5 


AAATTCTTGCTCGTTGACCTCCACT 


SEQ ID NO 
56 


Oligo 


Nucleotide Sequence 
TCCACTCAGTGTGATTCCACCTTCTCCAAGAA 


41 


CF549-1 


TTCTTGGAGAAGGTGGAATCACACT 


42 


CF549-2 


TCTTGGAGAAGGTGGAATCACACTG 


43 


CF549-3 


CTTGGAGAAGGTGGAATCACACTGA 


44 


CF549-4 


TTGGAGAAGGTGGAATCACACTGAG 


45 


CF549-5 


TGGAGAAGGTGGAATCACACTGAGT 



Table 12 shows the result of this experiment with respect to the 
strand of this target molecule. 
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Table 12 










Nested GBA^ " Reaction Using "+" Template Strand of CF Gene At 

Loais549 




Nucleotide Extended Using Exo-Kle 


;now 


Primer 


A 


C 


G 


T 


Tie- J 

used 


Signal 


TIN 


Signal 


I'lN 


Signal 


TIN 


Signal 


IIN 


CF549+1 


0^ 


0.09 


2.99 


0.09 


0.20 


0.09 


079 


0.09 


CF549+2 


3.49 


0.10 


0.74 


0.10 


0.19 


0.10 


0.25 


0.10 


CF549+3 


1.01 


OJl 


3.23 


0.12 


026 


0.13 


038 


0.12 


CF549+4 


0.47 


0.19 


0.89 


0.16 


056 


025 


2.76 


0.14 


CF549+5 


030 


0.11 


2.97 


0.12 


0.21 


0.13 


0.39 


0.12 




Nucleotide Extended Using Klenow 


Primer 


A 


C 


G 


T 


Used 


Signal 


TIN 


Signal 


TIN 


Signal 


TIN 


Signal 


TIN 


CF549+1 


0.43 


0.15 


337 


0.10 


0.29 


0.15 


0.82 


0.64 


CF549+2 


3.45 


0.36 


138 


0.10 


033 


0.22 


0.48 


0.14 


CF549+3 


1.52 


0.13 


3.60 


0.11 


036 


0.18 


0.51 


0.11 


CF549+4 


1.41 


0.17 


1.87 


0.13 


0.92 


0.26 


3.48 


0.15 


11 CF549+5 


0.60 


0.11 


3J12 


0.12 


0.28 


0.15 


039 


0.11 



As indicated in Table 12, both Klenow and Exo-KJenow gave nested 
GBA"^*^^ extension products of C, A, T and C, respectively for primers CF549+ 
through CF549+5. Tne deduced sequence for the 549 locus of the target is 
5 therefore GAGTG. as expected. The results obtained above were confirmed by 
performing a nested GBA'^'^* reaction using the CF strand. The results of this 
experiment are presented in Table 13. 
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Tabie 13 



Nested GBA^m Reaction Using "-" Template Strand of CF Gene At 

Locus 549 





Nu 


deotide Extended Using Exo-Klenow 


Primer 
Used 


A 


C 




G 


1 T 




91 ^al 


1 IIN 


Signal 


TIN 


Signal 


1 TIN 


Signal 


TIN 


CF349-1 


0.37 


0.11 


028 


0.15 


3.09 


0.10 


0.45 


0.13 


CF549-2 


332 


0.11 


0.25 


0.13 


0.52 


0.13 


0.27 


0.12 


CF549-3 


1.65 


0.12 


0.19 


0.19 


3^8 


0.13 


1.34 


0.19 


CF549-4 


0.69 


0.09 


0.11 


0.12 


1.16 


0.10 


2.97 


0.09 


CF549-5 


1.10 


0.12 


1 0.56 


031 


3.45 


054 


1.45 


0,15 




N 


ucleotide Extended Usinp Kleni 




Primer 
Used 


A 




C 




T 


Signal 


TIN 


Signal 


IIN 


Signal 


TIN 


Signal 


TIN 


CF549-1 


1.31 


0.13 


0.77 ■ 


0.53 


337 


0.17 


1.21 


0.21 


CF549-2 


3.52 


0.13 


0.74 


0.63 


136 


0.15 


0.89 


o->i 


CF549-3 


1.65 


0.12 


0.19 


0.19 


3.88 


0.13 


1.34 


0.19 


CF549-4 


0.69 


0.09 


0.11 


0.12 


1.16 


0.10 


2.97 


0.09 


1 CF549-5 


1.10 


0.12 


0.56 


0.31 I 


3.45 


0.54 


1.45 


0.15 



As indicated in Table 13, both Klenow and Exo-Klcnow gnvc nested 
GBATM extension products of G, A, G, T and G. respectively for primers CF549- 
1 through CF549.5. The deduced sequence for the 549 locus of the target is 
therefore CACTC. as expected. The results obtained above were confirmed by 
performing a nested GBA^m reaction using the "-" CF strand. The results of this 
experiment are presented in Table 13. Klenow, Exo-Klenow and Sequenase 
were compared for their abiHt>' to scr%'c as the polymerase in the nested GBA^ " 
reaction shown in Example 13. The enzymes gave equivalent N-GBA^m results. 
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Example 5 

Nested GBA™ Analysis of Hot Spots in the p53 Gene 

The feasibility of utilizing the nested GBA™ (N-GBA™) approach to 
acaarately detect p5S mutations was evaltiated. 

The p53 gene encompasses an approximately 19 kilobase stretch, 
comprising 11 exons (393 codons), of chromosome region 17pl3.105-pl2. 
Characterized as a tumor antigen in 1979, then as an oncogene, and finally as a 
tumor suppressor gene, p53 has received increasing study in cancer research. 
Mutations in the p53 gene are the single most common genetic alteration in 
human cancers and generally result in loss of function of the protein. The p53 
protein's apparent role in regulating cell growth and apoptosis suggests it is a 
core protein in determinaHon of tumorigenesis, with mutations in p53 being 
part of the cascade necessary for the development of many tumors. Three 
quarters of colon cancers and half of lung and breast cancers have been 
15 reported to contain p53 mutations (Levine. A.J., Cnnc. Surveys 12:59-79 (1992); 
herein incorporated by reference). Since more than 100,000 additional cases of 
each of these cancers is diagnosed each year, the potential application of p53 
analysis is significant both clinically and commercially. The majority of p53 
mutations are missense (ranging from 75% to more than 90%), tightly clustered 
between codons IIS and 309, the DNA binding region of the protein. Amino 
acids 1/D, 248, 249, 273, and 2S2 account for 40% of the total reported missense 
mutations, and the predominance of these so-called "hot-spots" vary 
depending on the tissue of origin of the cancer. The diversity and dispersion of 
clinically relevant mutations poses a significant challenge to the development of 
routine detection strategies. Because of the high prevalence of p53 mutations in 
a wide variety of common cancers and the large number of potential mutations 
in a defined gene region, p53 is an excellent target for development of a 
sequence composition/confirmation analysis tool such as SCAN™. 



20 



25 
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Nested GBA''^ priiners were designed for all DNA bases in a hot-spot 
(codons 272-282) of the target p53 gene. Figure 2 displays the four mutational 
hot-spot regions of pS3 gene with the wild-type and known representative 
mutant sequences of codons 272 to 282 (region D) highlighted. Specifically, 
5 three synthetic templates are designed to match three DNA samples, each 
containing a mutation in either codon 273, 275, or 281. Two additional synthetic 
templates are designed to be representative of a deletion mutation (codons 266 
and 267 deleted) and an insertion mutation (C insertion at codon 280). 

One PCR primer for the each primer set will have four phosphorothioate 

10 linkages at its 5* end in preparation for TargEx''^^ treatment. TargEx™ is a 
method developed to quantitatively convert double-stranded PGR product into 
single-stranded DNA by selectively degrading one of the strands with 
bacteriophage T7 gene 6 exonuclease (Nikiforov, T.T. ct nL, NncI Acids Res 
22:4167-4175 (1994); Nikiforov, T.T. et aL, PCR Methods and Apps 3:285-291 

15 (1994)). Specifically, PCR product amplified from human genomic DNA using 
one fluorescein-labeled, phosphorothioated PCR primer and one unmodified 
primer are treated with T7 gene 6 exonuclease (U. S. Biochemical) at a final 
concentration of 2U/^I PCR (diluted in buffer supplied by manufacturer). After 
1 hr of incubation at room temperature, NaCl and EDTA arc added to a 

20 concentration of 1.5M and 10 mM, respectively, to stop the exonuclease 
digestion. The mixture is then applied to the immobilized GBA^'^ primer for 
subsequent hybridization and extension. After extension, the standard ABI 
fluorescent cycle sequencing system is used to analyze the reaction. 

The 5' end of the primers are specifically attached to glass slides to form 

25 a SCAN^' array. Synthetic oligonucleotide templates corresponding to portions 
of the target hot-spot and containing various known mutations are used to test 
the array and the GBA"^"^' biochemistry to demonstrate that robust, 
imambiguous (low noise and background) data can be obtained from such an 
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analysis. Permutations of the standard GBA'« biochemistr>'. in particular the 
use of different DNA polymerases, are evaluated to ensure optimal signalmoise 
(S-JnJ) characteristics for all 4 nucleotides in the feasibility test system. 

Primer pairs will be qualified by amplification of human genomic DNA 
at a concentration of 12.5 jig/ml in 30^1 reactions in 96 well V-bottom 
polycarbonate plates (Costar). The final concentration of the reaction mixture 
wUl be 400 jiM each dNTP, 50 mM KQ. 10 mM Tris Ha (pH 8.5), 1.5 mM 
MgCl2, 0.5 jiM each primer, 2.5 ng/fil DNA, and a.G25 U/nl Taq DNA 
polymerase (Perkin-Elmer). Ead\ reaction will be overlayed with 30^l mineral 
oil and cycled in a BioHI thermocyder (Sun Bioscience Inc., Brantord CT). 
Following an initial two minute denaturation step at 94'»C, 35 cycles will be 
carried out, each consisting of denaturation (1 min at 94°C), annealing (2 min at 
SS'C), and extension (3 min at 72*C). Ten ^l of PCR product will be run on 15% 
non-denaturing polyacrylamide gels at 40W for 40 min to analyze yield. The 
amplification products will be quantified by comparison with multiple 
dilutions of a Mass Marker (BRL). 

Example 6 

Use of Nested GBA™ Primers on PCR-Generated Templates 

The performance of the Nested GBA™ method is assessed using PCR- 
amplified genomic DNA as the target for analysis. At least two overlapping 
PCR primer pairs are designed and tested on wild-type and mutant-containing 
genomic DNAs (five total), and the resultant PCR products tested by N-GBA^ 
on the SCAN'x arrays produced in Example 5. The PCR products will be 
evaluated for hybridization and extension efficiencies relative to the synthetic 
templates of Example 5 to ensure that analysis of PCR products is equally 
robust. 
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Example 7 . 
Analysis of Primer Extension at Position of 3' 
Terminal Nucleotide Mismatch 

An experiment was perfomied in order to determine the capacit}' of 
various polymerases to extend a primer having a mismatch at its 3' terminus. 
Two 6-mer primers were prepared and were separateriy hybridized to each 
member of a set of four template molecules whose sequences differed only in 
the identity of the 6th nucleotide, as shown in Table 14. In Table 14, "X" 
denotes the 3' terminal nucleotide of the primer; "Y" denotes the nucleotide of 
the template that is opposite to "X" when the primer and template are 
hybridized to one another. 





Table 14 


SEQIDNO. 


Molecule 


Nucleotide 


Nucleotide 






Sequence 


X 


Y 




Primer 4248 


TATGGC 


C 




37 


Template Tl 


CGGTTACCATA 




A . 


58 


Template T2 


CGGTTCCCATA 




C 


58 


Template T3 


CGGTTGCCATA 




G 


60 


Template T4 


CGGTTTCCATA 




T 




Primer 424R 


TATGGA 


C 




57 


Template Tl 


CGGTTACCATA 




A 


58 


Template T2 


CGGTTCCCATA 




C. 


59 


Template T3 


CGGTTGCCATA 






1 60. 


Template T4 


CGGTTTCCATA 







Thus, for each primer, the efficiency and capacity of extension was 
15 determined using four parallel reactions, in which three comprise efforts to 
extend a mismatched 3' terminus, and one comprised a control in which the 3' 
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terminal hudeotide of the primer was correctly base paired. Extension was 
determined by GBA™ reaction. 

The results of this experiment are shown in Table 15, with respect to four 
polymerases: "K" (Klenow), "exo-K" (exo-Klenow), "Bst" (Bst polj^erase) and 
D Therm" (Ihermosequenase). The data are expressed in optical densitv units 
Table 15 shows that Thermosequenase did not extend primers whose 3' 
termmal nucleotides were not based paired to the template. In contrast 
Klenow and Exo-Klenow were both able to incorporate label onto the 3 ' 
terminus of 3' terminally mismatched primers, consistent with the data 
10 presented above. 
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Table 15 = 


Primer / 
JL cxnpiate 




Ubeied 

Nucleotide 
Present 




DNA Polymerase 

Employed 
exoK Therm Bst 


4748/Tl 


Ci\ 


A 


0.70 


1 80 










C 


0.30 






A TC 

O.Zd 






G 


n fin 


030 










T 




0.60 


0.45 


0.25 


4748/T2 


C:C 


A 


u./u 


n ^A 


U.4U 


028 






c 


030 


A XC 


0J25 


1 0.21 








l.oO 




A AT 


0.37 






T 


A CA 


030 


0.25 


0.20 1 


4748 /T3 


C:G 


A 


X./U 


230 


2.10 


2.80 






C 


1.00 


0.60 


0.40 


025 








0.90 


020 


030 


020 






T 


0^0 


030 


022 


0.18 


4748 /T4 


C:T 


A 


230 


2.10 


0.75 


1.10 






C 


0.30 


0.70 


025 


021 






G 


130 


0.40 


0.35 


030 






T 


030 


0.30 


0.15 


020 


4749/Tl 


A:A 


A 


0.80 


030 


0.40 


0.30 






C 


030 


030 


030 


0.30 






G 


130 


030 


0.40 


0.30 






T 


2.10 


0.70 


030 


030 


4749/T2 


A:C 


A 


1.00 


1.10 


0.30 


0.40 




1 C 


0.30 


0.40 


030 


0.30 






G 
T 


230 


0.70 


0.50 


0.50 


4749/T3 


A:G 


A 

C 
G 


0.60 

0.90 
3.00 


0.41) 

0.50 
U.90 


0.75 

0.50 
0.7U 


0.20 

0.40 
0.30 


4749/T4 


A:T 


T 

A 

C 
G 


1.40 

0.70 

3.00 
030 
1.20 


0.60 
0.40 

3.00 
0.30 
030 


0.40 
0.30 

2.00 
0.30 
0.30 


0.40 
0.20 

2.40 
020 




T 


0.60 


0.3O 


0.30 


0.30 
0.20 
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While the invention has been described in connection with specific 
embodiments thereof, it will be understood that it is capable of further 
modifications and this application is intended to cover any variations, uses, or 
adaptations of the invention following, in general, the principle's of the 
invention and including such departures from the present disclosure as come 
within known or customary practice within the art to which the invention 
pertains and as may be applied to the essenHal features hereinbefore set forth 
and as foDows in the scope of the appended claims. 
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SEQUENCE LISTING 



(1) GENfERAL INFORMATION: 

5 

(i) APPUCANT: BOYCE-JACINO, MICHAEL T. 
ROGERS, YU-HUI 
GOELET, PHILIP 



10 (u)TrrLE OF INVENTION: 



METHOD FOR DETERMINING THE 
NUCEOTTDE SEQUENCE OF A 
POLYNUCLEOTIDE 



(iii) NUMBER OF SEQUENCES: 60 

15 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: HOWREY & SIMON 

(B) STREET: 1299 PENNSYLVANIA AVENUE. N.W. 

(C) CITY: WASHINGTON 
20 (D) STATE: DC 

(E) COUNTRY: US 

(F) ZIP: 20004 

(v) COMPUTER READABLE FORM: 
25 (A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFn\'ARE: Patentin Release #1.0. Version #1.25 

30 (vi) CURRENT APPLICATION DATA: 

(A) APPUCATION NUMBER: US 

(B) FIUNG DATE: 

(C) CLASSIFICATION: 

35 (viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: AUERBACH. JEFFREY 1. 

(B) REGISTRATION NUMBER: 32,680 

' (C) REFERENCE/DOCKET NUMBER: 04990.0026 

40 (ix) TELECOMMUNICATION INFORMATION- 

(A) TELEPHONE: (202) 383-7451 

(B) TELEFAX: (202) 383-661 0 
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(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 25 base pairs 
5 (B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

15 CTTGTGCTGA CTTACCAGAT GGGAC 

(2) INFORMATION FOR SEQ ID N0:2: 

(i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

25 (ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: YES 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO '>- 

30 

TTGTGCTGAC TTACCAGATG GGACA 
(2) INFORMATION FOR SEQ ID NO:3: 

35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: YES 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO;3: 
TTGTGCTGAC TTACCAGATG GGACC 

5 

(2) INFORMATION FOR SEQ ID N0:4: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 25 base pairs 
10 (B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iu) HYPOTHETICAL: YES 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 
20 rrGTGCTGAC TTACCAGATG GGACT 
(2) INFORMATION FOR SEQ ID N05: 

(i) SEQUENCE CHARACTERISTICS- 
25 (A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: YES 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 
TTGTGCTGAC TTACCAGATG GGACG 
(2) INFORMATION FOR SEQ ID NO:6: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 



30 



35 



40 



25 



25 
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(P) TOPOLOGY: linear 
(u) MOLECULE TYPE: DNA (genomic) 
5 (iii) HYPOTHETICAL: YES 

(») SEQUENCE DESCRIPTION: SEQ ID NO:6: 
TGTGCTGACT TACCAGATGG GACAA 

10 

(2) INFORMATION FOR SEQ ID NO:7: 

(i) SEQUENCE CHARACTERISHCS: 

(A) LENGTH: 25 base pairs 
15 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 

(iii) HYPOTHEnCAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NQ?: 

25 TGTGCTGACT TACCAGATGG GACAC 

(2) INFORMATION FOR SEQ ID NO:8: 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 25 base pairs 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

35 (ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHEHCAL: YES 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 

40 

TGTGCTGACT TACCAGATGG GACAT 
(2) INFORMATION FOR SEQ ID NO:9: 
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(i) SEQUENCE CHARACTERISnCS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nuddc add 

5 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 (iii) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID Na9: 

TGTGCTGACT TACCAGATGG GACAG 

(2) INFORMATION FOR SEQ ID NO:10: 



15 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 25 base pairs 
20 {B)Tn'E:nudeicadd 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

^ (ii) MOLECULE TYPE: DNA (genomic) 

(ui) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NaiO: 

30 TGTGCTGACT TACCAGATGG GACCA 

(2) INFORMATION FOR SEQ ID NOrll: 

(i) SEQUENCE CHARACTERISTICS- 
35 (A) LENGTH: 25 base pairs 

(B) m>E: nucleic add 

(C) STRANDEDNESS: sincle 

(D) TOPOLOGY: linear 

40 (ii) MOLECULE TYPE: DNA (genomic) 
(iu) HYPOTHETICAL- YES 
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(xi) SEQUENCE DESCRIPTION: SEQ ID Nail 
TGTGCTGACT TACCAGATGG GACCC 
5 (2) INFORMATIOM FOR SEQ ID NO:12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE; nucleic add 

10 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
15 (iii) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID Nai2: 
TGTGCTGACT TACCAGATGG GACCT 

20 

(2) INFORMATION FOR SEQ ID NO:13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 
25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 

35 TGTGCTGACT TACCAGATGG GACCG 

(2) INFORMATION FOR SEQ ID NO:14: 

(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



wo 97/35033 



-61- 



(u) MOLECULE TYPE: DNA (genomic) 
(iu) HYPOTHEnCAL: YES 

5 

(xi) SEQUENCE DESCRIPTION: SEQ ID NQM: 
TGTGCTGACT TACCAGATGG GACTA 
10 (2) INFORMATION FOR SEQ ID Nai5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic add 

15 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 (iu) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NChlS: 

TGTGCTGACT TACCAGATGG GACTC 

25 

(2) INFORMATION FOR SEQ ID NO:16: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 25 base pairs 

30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iU) HYPOTHEnCAb ^'ES 
(xi) SEQUENCE DESCRIPTION: SEQ ID Nai6: 
40 TGTGCTGACT TACCAGATGG GACTT 
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25 



30 



35 



40 



(2) DMPORMAHON FOR SEQ ID N0:17: 

(i) SEQUENCE CHARACTERISTICS- 
(A) LENGTH: 25 base pairs 
5 (B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

C") MOLECULE TVPEDNA (genomic) 
(ui) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTrON: SEQ ID NO:17: 

15 TGTGCTGACT TACCAGATGG GACTG 

(2) INFORMATION FOR SEQ ID Nai8: 

(i) SEQUENCE CHARACTERISTICS- 
20 (A) LENGTH- 25 base pairT 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: sinele 

(D) TOPOLOGY: linear 

m MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: YES 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 
TGTGCTGACT TACCAGATGG GACGA 
(2) INFORMATION FOR SEQ ID NO:19: 

(i) SEQUENCE. CH ARACTERISTICS- 
A) LENGTH: 25 base pairs 
(B) TYPE: nucleic acid 

JSS?^^^DNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ui) HYPOTHEnCAL: YES 



25 
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15 



20 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: 
TGTGCTGACT TACCAGATGG GACGC 

5 

(2) INFORMATION FOR SEQ ID NO:20; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base paiis 
10 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(u) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

TGTGCTGACT TACCAGATGG GACGT 

(2) INFORMATION FOR SEQ ID NO:21: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

30 (ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: YES 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
TGTGCTGACT TACCAGATGG GACGG 
(2) INFORMATION FOR SEQ ID NO:22: 

40 (j) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: single 



35 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 
5 (iu) HYPOTHETICAL- YES 

(xi) SEQUENCE DESdUPTION: SEQ ID NO:22: 
GTGCTGACTT ACCAGATGGG ACAAA 25 

10 

(2) INFORMATION FOR SEQ ID NO^: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 
15 (B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 

(iii) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

25 CCGTACTCCC ATCTGGTAAG TCAGCACAAG 30 

(2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
• (D) TOPOLOGY: linear 

35 (ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

40 (A) ORGANISM: HOMO SAPIENS 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: EXON 23 BRCAl GENE 
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(vii) IMMEDIATE SOURCE: 

(B) CLONE: EXON 23 BRCAl GENE 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 

5 

TGTGCTGACTTACCAGATGGGACAC 25 
(2) INFORMATION FOR SEQ ID NO:27: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TifPE; nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: Unear 

15 

(ii) MOLECULE TYPE: cDNA 

(iu) HYPOTHETICAL: NO 

20 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: HOMO SAPIENS 



25 



(vii) IMMEDIATE SOURCE: 

(B) CLONE: EXON 23 BRCAl GENE 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 
GTGCTGACTTACCAGATGGGACACT 25 
30 (2) INFORMATION FOR SEQ ID NO:28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

35 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
40 (iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: HOMO SAPIENS 
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10 



(vii) D^IMEDIATE SOURCE: 

(B) CLONE: EXON 23 BRCAl GENE 

(xi) SEQUENCE DESCRIPTTON: SEQID Na28: 

TGCTGACTTA CCAGATGGGA CACTC 

(2) INFORMATION FOR SEQ ID NO:29: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 25 base paiT 

(B) TVPE:nuddcadd 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: HOMO SAPIENS 

(vii) IMMEDL^TE SOURCE- 

(B) CLONE: EXON 23 BRCAl GENE 
(xi) SEQUENCE DESCRIPTION: SEQ ID Na29: 

GCTGACTTAC CAGATGGGAC ACTCT 
(2) INFORMATION FOR SEQ ID NO:30: 

S8^F^rS5"^^^^WSnCS: 
35 ^ ''ase pairs 

35 (B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear ^ 

(H) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 



20 



25 



30 



40 



25 
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(vi) ORIGINAL SOURCE; 

(A) ORGANISM: HOMO SAPIENS 

(vii) IMMEDIATE SOURCE: 

5 (B) CLONE: EXON 23 BRCAl GENE 

(xi) SEQUENCE DESCRIPTTON: SEQ ID NaSO: 
GTCATTAATG CTATGCAGAA AATCT 

10 

(2) INFORMATION FOR SEQ ID NOSl: - 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 25 base pairs 
15 (B) TYPE: nucleic add 

(C) STRANDEDNE5S: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

20 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: HOMO SAPIENS 

25 

(vii) IMMEDIATE SOURCE: 

(B) CLONE- EXON 23 BRCAl GENE 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: 

30 

• TCATTAATGC TATGCAGAAA ATCTT 
(2) INFORMATION FOR SEQ ID NO:32: 

35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: HOMO SAPIENS 

5 (vu) IMMEDIATE SOURCE: 

(B) CLONE: EXON 23 BRCAl GENE 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 

10 CATTAATGCT ATGCAGAAAA TCTTA 25 

(2) INFORMATION FOR SEQ ID Na33: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 25 base pairs 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

20 (ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

25 (A) ORGANISM: HOMO SAPIENS 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: EXON 23 BRCAl GENE 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 

ATTAATGCTATGCAGAAAATCTTAG 25 
(2) INFORMATION FOR SEQ ID NO:34: 

35 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
40 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE- 
(A) ORGANISM: HOMO SAPIENS 

(vii) IMMEDIATE SOURCE- 
OB) CLONE EXON 23 BRCAl GENE 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO-34: 

TTAATGCTATGCAGAAAATCTIAGA 25 

(2) INFORMATION FOR SEQ ID NO-35: 

15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

20 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

25 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: HOMO SAPIENS 



30 



(vii) IMMEDIATE SOURCE: 

(B) CLONE: EXON 23 BRCAl GENE 



(xi) SEQUENCE DESCRIPTION: SEQ ID NOJ5: 
TAATGCTATG CAGAAAATCT TAGAG 25 
35 (2) INFORMATION FOR SEQ ID N036: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

40 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(iii) HYPOTHEnCAL: NO 

(vi) ORIGINAL SOURCE: 

5 (A) ORGANISM: HOMO SAPIENS 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: CYSTIC FIBROSIS GENE 

10 (xi) SEQUENCE DESCRIPTION: SEQ ID Na36: 
AAAG AAATTC TTGCTCGTIG ACCTC 
(2) INFORMATION FOR SEQ ID N037: 

15 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

25 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: HOMO SAPIENS 

(vii) IMMEDIATE SOURCE: 

30 (B) CLONE: CYSTIC HBROSIS GENE 

(xi) SEQUENCE DESCRIPTION: SEQ ID Na37: 
AAGAAATTCT TGCTCGTTGA CCTCC 

35 

(2) INFORMATION FOR SEQ ID NO:38: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 25 base pairs 
40 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 
(ui) HYPOTHEnCAL: NO 

5 (vi) ORIGINAL SOURCE; 

(A) ORGANISM: HOMO SAPIENS 

(vii) IMMEDL^TE SOURCE: 

(B) CLONE: CYSTIC HBROSISGENE 

10 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: 
AGAAArrCTT GCTCGTTGAC CTCCA 
15 (2) INFORMATION FOR SEQ ID NO J9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic add 

20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
25 (iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: HOMO SAPIENS 

30 (vii) IMMEDIATE SOURCE: 

(B) CLONE: CYSTIC HBROSIS GENE 

(xi) SEQUENCE DESCRIPTION: SEQ ID Na39: 

35 GAAATTCTTG CTCGTTGACC TCCAC 

(2) INFORMATION FOR SEQ ID Na40: 

(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

5 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: HOMO SAPIENS 

(vu) IMMEDLVTE SOURCE: 
10 (B) CLONE: CYSTIC FIBROSIS GENE 

(») SEQUENCE DESCRPTION: SEQ ID Na40: 

AAATTCTTGC TCGTTGACCT CCACT 

15 

(2) INFORMATION FOR SEQ ID Na41: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 25 base pairs 

20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

25 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: HOMO SAPIENS 

30 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: CYSTIC FIBROSIS GENE 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: 

35 

TTdTGGAGA AGGTGGAATC ACACT 
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(2) INFORMATION FOR SEQ ID NO:44: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 25 base pairs 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: cDNA 
(ui) HYPOTHETICAL- NO 

(vi) ORIGINAL SOURCE: 

15 (A) ORGANISM: HOMO SAPIENS 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: CYSTIC FIBROSIS GENE 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 
TTGGAGAAGG TGGAATCACA CTGAG 
(2) INFORMATION FOR SEQ ID NO:4S: 

25 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sinple 
30 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iu) HYPOTHETICAL NO 

35 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: HOMO SAPIENS 

(vii) IMMEDIATE SOURCE: 

40 (B) CLONE: CYSTIC FIBROSIS GENE 
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(2) INFORMATION FOR SEQ ID NO:42: 

(i) SEQUENCE CHARACIERISnCS: 
(A) LENGTH: 25 base pairs 

5 (B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: Unear 

(ii) MOLECULE TYPE: cDNA 

10 

(ill) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: HOMO SAPIENS 

15 

(vu) IMMEDL\TE SOURCE: 

(B) CLONE: CYSTIC FIBROSIS GENE 

(xi) SEQUENCE DESCRIPTION: SEQ ID Na42: 

20 

TCTTGG AG A A GGTGGAATCA CACTG 
(2) INFORMATION FOR SEQ ID Na43: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

30 

(ii) MOLECULE TYPE- cDNA 

(iii) HYPOTHETICAL: NO 

35 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: HOMO SAPIENS 

(vii) IMMEDL\TE SOURCE: 

(B) CLONE: CYSTIC FIBROSIS GENE 

40 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 
CTTGGAGAAG GTGGAATCAC ACTGA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: 
TGGAGAAGGT GGAATCACAC TGAGT 25 
5 (2) INFORMATION FOR SEQ ID Na46: 

(i) iSEQUENCE CHARACTlRISnCS: 

(A) LENGTH: 49 base pairs 

(B) TYPE: nucleic acid 

10 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
15 (ui) HYPOTHEnCAL: YES 

(xi) SEQUENCE DESCRIPTIQN: SEQ ID NO;46: 

CCAGAAG AAA GGGCCTTCAC AGTGTCCnT ATGTAAGAAT GATATAACC 49 

20 

(2) INFORMATION FOR SEQ ID NO:47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 49 base pairs 
25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

30 

(iii) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 
35 CCAGAAGAAA GGGCCTTCAC AGGGTCCTTr ATGTAAGAAT GATATAACC 49 
(2) INFORMATION FOR SEQ ID NO:48: 

(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(u) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: YES 

5 

.(xi) SEQUENCE DESCRIPTION: SEQ ID Na48: 
GGTTATATCA TTCTTACATA AAGG 
10 (2) INFORMATION FOR SEQ ID NO:49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

15 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
20 (iii) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: 
GTTATATCATTCTTACATAA AGGA 

25 

(2) INFORMATION FOR SEQ ID NO:50: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 base pairs 

30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NaSO: 
40 TTATATCATT CTTACATAAA GGAC 
(2) INFORMATION FOR SEQ ID NO:51: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic add 

(C) 5TRANDEDN1SS: single 
5 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: YES 

10 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0.51: 
TATATCATTC TT AC ATAAAG GACA 
15 (2) INFORMATION FOR SEQ ID NO:52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

20. (Q STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
25 (iii) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID Na52: 
ATATCATTCT TACATAAAGG ACAC 

30 

(2) INFORMATION FOR SEQ ID NOSS: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 34 base pairs 

35 (B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

40 

(iii) HYPOTHETICAL: NO 
(vi) ORIGINAL SOURCE: 
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(A) ORGANISM: HOMO SAPIENS 

(vii) IMMEDIATE SOURGE: 

(B) GLONE: EXON 23 BRGAl GENE 

5 

(xi) SEQUENGE DESGRIPTION: SEQ ID Na53: 
TGTTAGAGTG TGCGATGTGG TAAGTCAGCA CAAG 34 
10 (2) INFORMATION FOR SEQ ID NO-54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic add 

15 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

20 (iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 
. (A) ORGANISM: HOMO SAPIENS 

25 (vii) IMMEDIATE SOURCE: 

(B) CLONE: EXON 23 BRCAl GENE 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:54: 

30 GACACTCTAA GATTTTCTGC ATAGCATTAA TGAC 34 

(2) INFORMATION FOR SEQ ID NO:55: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 (ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: HOMO SAPIENS 

(vii) IMMEDIATE SOURCE: 

3 (B) CLONE: CYSTIC FIBROSIS GENE 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO»5: 
CTGAGTGGAG GTCAACGAGC AAGAATTTCTrT 

10 

(2) INFORMATION FOR SEQ ID NOSfc 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 32 base pairs 
15 (B) TiTE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

20 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: HOMO SAPIENS 

25 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: CYSTIC FIBROSIS GENE 

(xi) SEQUENCE DESCRIPTION: SEQ ID N056: 

30 

TCCACTCAGT GTGATTCCAC CTTCTCCAAG AA 
(2) INFORMATION FOR SEQ ID NO:57: 

35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TTTE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: YES 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N057: 
CGGITACCATA 11 
^ (2) INFORMATION FOR SEQ ID NO:58: 

(j) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 
10 (B) TYPE: nucleic acid 

(C) STRAlsIDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

15 

(ui) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NOSS: 

20 CGGTTCCCAT A 11 

(2) INFORMATION FOR SEQ ID N059: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 1 1 base pairs 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

30 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: 

35 

CGGTTGCCATA 11 
(2) INFORMATION FOR SEQ ID NO:60: 

40 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: YES 

(xi) SEQUENCE DESCRIPT[ON: SEQ ID NO:60: 
CGGTTTCCATA n 
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WHAT IS CLAIMED IS: 

1. A method for determining the nucleotide sequence, o/ a nucleic acid 
molecule which comprises the steps of. 

(A) arraying a set of nested primer oligonucleotides onto a solid 
5 support, each array position containing a different array member having a 

predetermined sequence; 

(B) incubating oligonucleotides of said array in the presence of a 
preparation of said nucleic acid molecules, a polymerase and at least one chain 
terminator nucleotide; wherein said incubation is under conditions sufficient to 

10 permit DNA hybridization to occur between the oligonucleotides of said 
incubation and said nucleic acid molecules; wherein said incubation is 
conducted in the substantial absence of any non-chain terminator nucleotides; 

(C) (1) in the case wherein the 3' terminal nucleotide of an 
oligonucleotide is hybridized to said nucleic acid molecule, permitting 

15 oligonucleotides hybridized to nucleic acid molecules to be extended by 
polymerase-mediated incorporation of a single chain terminator nucleotide 
residue onto the 3' terminus of said hybridized oligonucleotide, wherein for 
each hybridized oligonucleotide being so extended, said incorporated 
nucleotide residue is complementary to the nucleotide residue immediately 5' 
20 to the nucleotide residue of the nucleic acid molecule that is hybridized with 
that oligonucleotide's 3' terminal nucleotide residue; then performing step (D); 

(2) in the case wherein the 3' terminal nucleotide of an 
oligonucleotide is not hybridized to said nucleic acid molecule, either: 

(a) not permitting oligonucleotides hybridized to 
nucleic acid molecules to be extended by polymcrnse-mediatcd incorporation of 
a single chain terminator nucleotide residue onto the 3' terminus of said 
hybridized oligonucleotide, or 
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(b) permitting the removal of any non-hybridized 
nucleotide residues from the 3' terminus of said hybridized oligonucleotides, so 
as to form a truncated primer oligonucleotide whose 3' terminus is hybridized 
to said nucleic acid molecule, and then permitting polymerase-mediated 
5 incorporation of a single chain terminator nucleotide residue onto the 3* 
terminus of said hybridized truncated oligonucleotide, wherein for each 
hybridized truncated oligonucleotide being so extended, said incorporated 
nucleotide residue is complementary to the nucleotide residue immediately 5' 
to the nucleotide residue of the nucleic acid molecule that is hybridized with 
10 that truncated oligonucleotide's 3' terminal nucleotide residue; then performing 
step (D); 

(D) determining, at each array position at which an oligonucleotide has 
incorporated a single chain terminator nucleotide residue, the identity of the 
incorporated chain terminator nucleotide residue; and 
15 (E) determining the nucleotide sequence of said nucleic acid molecule 

from the determined identity of the incorporated nucleotide of primer 
oligonucleotides of said array, and known sequence of the oligonucleotide at 
each array position. 

20 2. The method of claim 1, wherein each array position contains a primer 
oligonucleotide that is capable of hybridizing to a region of said nucleic acid 
molecule. 

3. The method of claim 1, wherein said polymerase is a Thcrmosequenase 
25 class polymerase. 

4. The method of claim 1, wherein said polymerase is a Klenow class 
polymerase. 



wo 97/35033 



PCT/US97/03701 



-85- 



5. The method of claim 4, wherein in step (C), at least some array positions 
contain nucleic add molecules hybridized to oligonucleotides said whose 3' 
terminal nucleotide is not hybridized to the nucleic acid molecule, and wherein 

5 step (C)(1) is conducted for such oligonucleotides. 

6. The method of claim 1, wherein said array is a random oligonucleotide 
array. 

10 7. The method of claim 1, wherein said array is a nested oligonucleotide 
array. 

8. The method of claim 7, wherein said nested array contains 
oligonucleotide members having all possible permutations of nucleotides over a 

15 region of from 1 to 20 bases. 

9. The method of claim 1, wherein said method is conducted in the 
presence of at least 'four chain terminator nucleotide species, at lenst one of 
which is labeled. 

20 

10. The method of claim 9, wherein all of said chain terminator nucleotide 
species are labeled, and wherein the label of any such species can be 
distinguished from the label of any other species present. 

25 11. The method of claim 1, wherein said nucleic add molecule is a DNA 
molecule. 

12. The method of daim 1, wherein said nudeic add molecule is RNA. 
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13. The method of claim 1, wherein said method is performed both on said 
nucleic add molecule, and on a complement of said nucleic acid molecule. 

5 14. The method of claim 11, wherein said DN A is genomic DN A of a human 
or non-human mammal. 

15. The method of claim 11, wherein said DNA is human genomic DNA. 

10 16. The method of claim 15, wherein said DNA is suspected to contain a 
genetic variation associated with a disease, and said method is employed to 
determine whether said DNA contains said variation. 

17. The method of claim 16, wherein said disease is cancer or c>'stic fibrosis. 

15 

18. The method of claim 1, wherein said oligonucleotides are immobilized 
onto said solid support. 

19. The method of claim 18, wherein said support is plastic or glass. 

20 

20. A kit for determining the sequence of a nucleic acid molecule which 
comprises a solid support containing an array of spaced apart receptacles for 
oligonucleotides, each receptacle containing a different primer oligonucleotide. 

25 21. The kit of claim 20, wherein each array receptacle additionally contains 
at least four chain terminator nucleotide species, at least one of which is labeled. 
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22. The kit of claim 20, wherein all of said chain terminator nucleotide 
species are labeled, and wherein the label of any such species can be 
distinguished from the label of any other species present. 

23. The kit of claim 20, wherein said kit determines the nucleotide sequence 
of DNA suspected to contain a genetic variation associated with a disease, and 
wherein said kit permits sufficient determination of nucleotide sequence to 
determine whether said DNA contains said variation. 

24. The kit of claim 23, wherein said disease is cancer or cystic fibrosis. 
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